topical and domain-specific frameworks for emotion detection
TRANSCRIPT
TOPICAL AND DOMAIN-SPECIFIC
FRAMEWORKS FOR EMOTION
DETECTION AN EXPERIMENTAL CLUSTER ANALYSIS OF EMOTIONS IN
REALITY TV
Aantal woorden: 16.919
Annaïs Airapetian Studentennummer: 01600351
Promotor(en): prof. dr. Orphée De Clercq
prof. Luna De Bruyne
Masterproef voorgelegd voor het behalen van de graad Master in het Vertalen
Academiejaar: 2019 - 2020
i
Verklaring i.v.m. auteursrecht
De auteur en de promotor(en) geven de toelating deze studie als geheel voor consultatie beschikbaar te
stellen voor persoonlijk gebruik. Elk ander gebruik valt onder de beperkingen van het auteursrecht, in
het bijzonder met betrekking tot de verplichting de bron uitdrukkelijk te vermelden bij het aanhalen
van gegevens uit deze studie.
ii
Preface
First of all, a well-deserved thank you goes to my supervisor, prof. dr. De Clercq, and co-supervisor,
prof. De Bruyne, for their guidance and patience. I know I am not the easiest to work with, but their
continuous positivity and faith in my work kept me motivated to bring this dissertation to a good end. I
am very thankful to have been part of such a wonderful team, and I hope my work will help with
future research.
I would also like to thank my friends for believing in me when I did not. Some of them shared this
final journey with me, which was both a blessing and a curse at times. Nevertheless, I could not have
done this without their support, and for that I will be forever grateful. Honestly, anyone who can
handle me during one of my many breakdowns or overdramatic rants deserves a medal.
I am very proud of what I have achieved, and I hope the people around me are too. Enjoy the pinnacle
of my academic career; it is an interesting read if I may say so myself.
iii
Preamble
Due to the unusual circumstances as a consequence of the coronavirus outbreak, the writing process of
this paper evolved slower and more difficult than usual. However, even though the situation affected
my mental capacity, there was no direct impact on the research itself, as it relied on my individual
work and there were no third parties involved. Thanks to the complete lockdown in the spring of 2020,
I was given some extra time to process the data and analyse the results, which can probably be
considered the only advantage of the pandemic. But as I decided to prioritise course material and
exams during my time in isolation, I had no choice but to postpone the completion of this paper to the
summer of 2020.
iv
Abstract
In the research field of natural language processing, emotion detection has become a prominent topic.
As there is no standard framework offered for automatic emotion detection, not for a specific domain
nor in general, our goal was to provide an emotion framework that is motivated both theoretically and
empirically for the domain of reality TV. This paper presents a cluster analysis on Dutch reality TV
transcriptions, with label sets for automatic emotion detection as a result. Seeing that automatic
applications first require manually annotated data, an extensive 25 emotion categories model from
psychological research was used to manually annotate 450 utterances from a self-made corpus of
reality TV transcriptions. Three Flemish TV series (“Bloed, Zweet en Luxeproblemen”, “Blind
Getrouwd” and “Ooit Vrij”) were included in the dataset, each representing a different topic. We
conducted a frequency and cluster analysis with the annotations in order to uncover underlying
relations between the emotion categories and eventually present limited, modified label sets. The
results revealed three topical label sets, as well as one general domain-specific label set. Even though
the majority of the emotions in the sets were found to be basic emotions, at least the selection of all
labels from the final sets was supported by empirical research.
v
Table of contents
List of abbreviations ........................................................................................................................... 0
List of figures ..................................................................................................................................... 0
List of tables ....................................................................................................................................... 0
1 Introduction .................................................................................................................................. 1
2 Theoretical background ................................................................................................................ 3
2.1 Emotion frameworks ............................................................................................................. 3
2.1.1 Dimensional versus categorical models .......................................................................... 3
2.1.2 Ekman’s Basic Six ......................................................................................................... 3
2.1.2.1 Background/context ........................................................................................... 3
2.1.2.2 Definition of emotions ....................................................................................... 4
2.1.2.3 Emotion families ................................................................................................ 4
2.1.2.4 Previous research ............................................................................................... 5
2.2 Cluster analysis ..................................................................................................................... 6
2.2.1 Similarity measures ....................................................................................................... 7
2.2.2 Clustering techniques..................................................................................................... 8
2.2.2.1 Hierarchical agglomerative (bottom-up) ............................................................. 8
2.2.2.2 Hierarchical divisive (top-down) ........................................................................ 9
2.2.2.3 Iterative partitioning ........................................................................................... 9
2.2.2.4 Density search .................................................................................................... 9
2.2.2.5 Factor analysis variants .................................................................................... 10
2.2.3 Linking methods .......................................................................................................... 10
2.2.3.1 Single linkage (nearest neighbour).................................................................... 10
2.2.3.2 Complete linkage (furthest neighbour) .............................................................. 11
2.2.3.3 Average linkage ............................................................................................... 11
2.2.3.4 Ward’s method ................................................................................................. 12
2.2.4 Criteria ........................................................................................................................ 12
2.2.5 Validation .................................................................................................................... 14
3 Methodology .............................................................................................................................. 15
3.1 Annotation .......................................................................................................................... 16
3.2 Frequency analysis .............................................................................................................. 17
3.3 Cluster analysis ................................................................................................................... 18
4 Results ....................................................................................................................................... 20
4.1 Frequency analysis .............................................................................................................. 20
4.2 Cluster analysis ................................................................................................................... 25
vi
4.2.1 Bloed, Zweet en Luxeproblemen ................................................................................. 25
4.2.2 Blind Getrouwd ........................................................................................................... 27
4.2.3 Ooit Vrij ...................................................................................................................... 29
4.2.4 Combined data ............................................................................................................ 31
5 Discussion .................................................................................................................................. 34
5.1 Analysis of the results ......................................................................................................... 34
5.2 Comparison to previous research ......................................................................................... 41
5.3 Validity, reliability and added value .................................................................................... 45
6 Conclusion ................................................................................................................................. 47
References ........................................................................................................................................ 49
Appendices ....................................................................................................................................... 52
0
List of abbreviations
NLP: natural language processing
AI: artificial intelligence
BZL: Bloed, Zweet en Luxeproblemen
BG: Blind Getrouwd
OV: Ooit Vrij
IAA: inter-annotator agreement
List of figures
Figure 1: Comparison of five clustering methods for nine criteria ...................................................... 13
Figure 2: 25 emotion categories with their subcategories ................................................................... 16
Figure 3: Frequencies of emotion categories compared per TV series ................................................ 20
Figure 4: Frequencies of emotion categories for Bloed, Zweet en Luxeproblemen ............................. 21
Figure 5: Frequencies of emotion categories for Blind Getrouwd....................................................... 22
Figure 6: Frequencies of emotion categories for Ooit Vrij ................................................................. 23
Figure 7: Frequencies of emotion categories for the three TV series combined .................................. 24
Figure 8a: Initial dendrogram for BZL .............................................................................................. 25
Figure 8b: Adapted dendrogram for BZL without infrequent emotions .............................................. 26
Figure 9a: Initial dendrogram for BG ................................................................................................ 27
Figure 9b: Adapted dendrogram for BG without infrequent emotions ................................................ 28
Figure 10a: Initial dendrogram for OV .............................................................................................. 29
Figure 10b: Adapted dendrogram for OV without infrequent emotions .............................................. 30
Figure 11a: Initial dendrogram for the three TV series combined ....................................................... 31
Figure 11b: Adapted dendrogram for the combined data without infrequent emotions ....................... 32
Figure 12a: Initial dendrogram for tweets .......................................................................................... 42
Figure 12b: Adapted dendrogram for tweets without infrequent emotions .......................................... 42
List of tables
Table 1: Polarity combinations with surprise ..................................................................................... 38
Table 2: Emotion sets........................................................................................................................ 44
1
1 INTRODUCTION
Emotion detection and emotion analysis have many applications. When it comes to textual data, they
are often performed on product reviews to evaluate customer satisfaction, or on tweets to uncover
linguistic trends in emotional speech. Especially with the technical advancement of today, the automatic
applications are becoming more and more popular research topics in the field of natural language
processing (NLP).
To enable automatic emotion detection, though, first manually annotated data is needed to train
the machines. Emotion analysis might be a popular research field, however no standard framework is
offered for emotion annotation tasks. There are plenty of categorical models available and many
researchers often opt for one of the better-known models, such as Ekman’s basic emotions set (1992)
consisting of anger, disgust, fear, joy, sadness and surprise. While these basic emotions are the most
agreed on universal emotions supported by substantial evidence (Ekman, Friesen, & Ellsworth, 1972)
and are frequently used for this type of research, most of the time no other valid reason is given as to
why those emotions would be the most appropriate for certain data.
Like Mohammad argues (2016, p. 215), it would be beneficial to adapt the emotion set to the
domain of your research. That is why the goal of this study is to propose an empirically grounded
framework for automatic emotion detection on Dutch reality TV data. The study that we have conducted
was inspired by a cluster analysis on Dutch tweets by De Bruyne, De Clercq and Hoste (2019), and can
be seen as follow-up research to their study. We have adopted a similar process, but applied it to different
data to expand the research field.
Similar to our previous study related to emotion analysis (Airapetian, 2019), the data for this
research consists of transcriptions from Flemish reality TV shows. Three TV shows were selected, each
with a different topic. The extensive emotion model used for the annotations consists of 25 emotion
categories and stems from psychological research by Shaver, Schwartz, Kirson and O’Connor (1987).
The compiled corpus of annotated transcriptions was further used as input for a frequency and cluster
analysis. The study presented in this paper was designed to examine emotion clusters and compare the
derived emotion labels from the topical data to those from the general data. This brings us to our main
research question: “Is it possible to deduce a label set from experimental cluster analysis?”. We intend
to provide an answer to these subquestions as well:
- Do the emotion clusters differ depending on the topic?
- Is there a difference between the emotion clusters for reality TV compared to those for
tweets?
- Do basic emotions provide a good foundation for emotion frameworks?
2
This thesis is structured as follows: section 2 offers some theoretical background. It provides an
overview of the most important aspects in relation to emotion frameworks and cluster analysis. More
specifically, the first part elaborates on the different classification models, Ekman’s basic emotions
model and the meaning of emotions, while the second part focuses on several approaches for the
different features of a cluster analysis, namely similarity measures, clustering techniques and linking
methods. The data and methodology for this study are described in detail in section 3. Then section 4
presents the results of both the frequency and cluster analyses, which are supported by clarifying graphs
and dendrograms. Section 5 further analyses the results and compares them to previous works, while
also reflecting on the validity and added value of this study. Finally, section 6 gives a closing statement
by repeating the main features of the study and summarising the arguments that support the answers to
our research questions.
3
2 THEORETICAL BACKGROUND
2.1 Emotion frameworks
2.1.1 Dimensional versus categorical models
There are several possible methods to classify emotions. As Buechel and Hahn (2016, pp. 1114-1115)
mention, a distinction can be made between categorical and dimensional models. The categorical
approach divides emotional states into emotion categories. The dimensional approach, on the other hand,
describes emotional states according to emotional dimensions, with the three most common dimensions
being valence, arousal and dominance. Valence is the polarity of the text and can be positive, negative
or neutral. Arousal describes the level of reaction to stimuli and the intensity of the emotion, ranging
from low to high. Finally, dominance means the control given by the emotion, which can range from
dominant to submissive. Dimensional models often use Russell and Mehrabian’s (1977) Valence-
Arousal-Dominance model, while categorical models usually refer to Ekman’s (1992) Basic Emotion
model, which divides emotions into six categories: anger, disgust, fear, joy, sadness and surprise.
2.1.2 Ekman’s Basic Six
2.1.2.1 Background/context
Psychologist and emotion scientist Paul Ekman is well-known for his studies in facial expressions and
emotions. His study of non-verbal behaviour of the Fore tribe in Papua New Guinea was ground-
breaking. Members of the tribe were told a simple story while being shown a set of three faces and were
then asked to select the face that they thought matched the story (Ekman, & Friesen, 1971). Surprisingly,
the subjects generally interpreted the facial expressions in the same way as would someone from a
Western society. The fact that the tribe had lived in complete isolation from the rest of the world but
nonetheless recognized the same emotions as people from the West, proved that facial expressions were
indeed universal.
Ekman, Friesen and Ellsworth (1972) found evidence for six basic emotions and with their
research eventually confirmed that all six had universal facial expressions. Most scientists now agree on
those six basic emotions and their distinctive facial expressions. However, this has not always been the
case and was certainly a gradual process. Charles Darwin (1872) was actually the first to claim that
emotions were a product of evolution and that they were universal, but it was not until Ekman’s research
that there came substantial evidence for that theory (Paul Ekman International, 2018).
4
2.1.2.2 Definition of emotions
Ekman (1999, p. 46) says that “emotions are designed to deal with inter-organismic encounters”, i.e.
interactions between groups of people or even between humans and animals. However, actual interaction
with a second party is not always necessary. Emotions can also occur when we are not in the presence
of others, no matter whether that presence would be physical or imaginative. The primary function of
emotions remains the same, namely to “mobilize the organism to deal quickly with important
interpersonal encounters” (Ekman, 1999, p.46).
He further elaborates on past, present and future as three important factors that should be taken
into consideration when distinguishing basic emotions. A first factor to consider is the current situation:
what is happening inside and around the person? Are there any factors that could influence the person?
The second factor is the preceding situation: what was the situation before this point in time? Did
something happen that could have possibly triggered a certain emotion or reaction? The third factor is
the possible continuation of the situation: what is most likely to happen next? What are the possible
consequences?
Ekman identifies six basic emotions: anger, disgust, fear, joy, sadness and surprise. His model
is the one that is the most frequently used in the field of emotion analysis. It originates from his research
on emotions and, more specifically, their relation to facial expressions. Plutchik (1980), like many others
(Ekman, 1992; Frijda, 1988; Izard, 1991; Parrot, 2001; Tomkins, 1962), agrees that some emotions are
indeed more basic than others, but he proposes a different set of emotions. He includes trust and
anticipation alongside Ekman’s six, while also further dividing the emotions into different degrees of
intensity. It is also worth mentioning that basic emotions such as those of Ekman can occur on their
own, but can also be combined to form more complex emotions (Ekman, 1999; Plutchik, 1962).
2.1.2.3 Emotion families
Ekman (1999, p. 47) is of the opinion that there is no such thing as a sine qua non for emotions, meaning
that a certain emotion cannot be distinguished from another emotion by a pre-defined set of
characteristics. According to Ekman (1999, p. 55), “each emotion is not a single affective state, but a
family of related states”. Emotions can therefore be divided into so-called emotion families. These can
vary in intensity and form, as well as in whether the emotion can be controlled or not and whether it
occurs spontaneously or deliberately (Ekman, 1997).
Contrary to what is explained above about emotions on their own, members of the same emotion
family do however share the same characteristics, which distinguishes them from other emotion
families. The Atlas of Emotions (2019) is a project of the Dalai Lama in cooperation with Paul Ekman
and describes the five most agreed on basic emotions as follows:
5
o anger: when we are mentally or physically blocked
e.g. annoyance, bitterness, fury
o disgust: when we are faced with something toxic or unpleasant
e.g. dislike, aversion, loathing
o fear: when our safety or wellbeing is threatened
e.g. anxiety, panic, terror
o joy: when we experience comfort, connection or pleasure
e.g. amusement, excitement, ecstasy
o sadness: when we lose something valuable
e.g. disappointment, misery, grief
2.1.2.4 Previous research
As mentioned before, research using Ekman’s emotions mainly focuses on their link with facial
expressions. However, Ekman’s model is also often applied in the emotion analysis of text. In this
research field, there have been numerous studies on tweets (Bakliwal et al., 2012; Wood, McCrae,
Andryushechkin, & Buitelaar, 2018), reviews (Fang, & Zhan, 2015; Thet, Na, & Khoo, 2010), blog
posts (Bakliwal, Arora, & Varma, 2012) and news stories (Godbole, Srinivasaiah, & Skiena, 2007), but
only few on other kinds of data such as subtitles or transcriptions. As already presented in previous work
(Airapetian, 2019) and repeated above, there are numerous frameworks for emotion analysis. The most
popular categorical model is still that of Ekman (1992), but there are many more options to choose from.
While basic emotion frameworks are usually fairly restricted, extensive emotion frameworks often
introduce more complex secondary emotions by combining basic emotions. However, the emotion
frameworks are often selected arbitrarily and so are not adapted to the task or domain they are intended
for.
One way to group data in order to produce a more limited set, is to conduct a cluster analysis. What
exactly a cluster analysis does, is it attempts to uncover the relations between emotion categories and
based on similarity divides them into different cluster groups, which are comparable to what is referred
to as emotion families in section 2.1.2.3. If each group is then assigned an umbrella term, this results in
a limited set of final emotion labels where each label represents several emotion categories.
Interesting to mention is that this process was employed by De Bruyne et al. (2019). They
conducted a cluster analysis using an extensive emotion framework in an attempt to generate a more
limited emotion set that was much more grounded in its task (emotion detection) and domain (tweets).
They used the annotations of 229 Dutch tweets as input for their cluster analysis, which resulted in a
final label set containing love, joy, anger, nervousness and sadness. Because of their data-driven
6
approach, this framework for automatic emotion detection was not only motivated theoretically, but also
empirically.
The extensive framework they used is that of Shaver et al. (1987), which originates from the
field of psychology. For that study, 112 psychology students were asked to rate 213 emotion words
based on their prototypicality. The task resulted in a selection of 135 emotion words, which were then
sorted by 100 students into a non-predefined number of categories. The results were used as input for a
cluster analysis, which eventually resulted in a final set of 25 emotions: affection, cheerfulness,
contentment, disappointment, disgust, enthrallment, envy, exasperation, horror, irritability, longing,
lust, neglect, nervousness, optimism, pride, rage, relief, sadness, shame, suffering, surprise, sympathy,
torment, zest.
2.2 Cluster analysis
In the field of AI, cluster analysis is seen as a form of unsupervised learning and more specifically it is
a data mining technique used for the natural grouping of data. Cluster analysis is sometimes also referred
to as typology construction, classification analysis or numerical taxonomy. The objective is to identify
underlying structures by grouping the cases from a dataset that are the most similar. The first step in that
process, though, is gathering and preparing data to form clusters with and later perform an analysis on.
During this process there are a few topics that need to be taken into consideration.
First it might be useful to discuss what exactly a cluster is. There are many descriptions of clusters and
not just one general definition, but what mainly characterizes a cluster is its “high internal homogeneity”
and the “high external heterogeneity” (Lazar, 2012). This means that in a dimensional space, members
of a cluster are located close to each other, but the clusters themselves lie further apart. Lazar (2012)
mentions some elements of the research design that need to be evaluated before processing the data.
These things include variables, size of the dataset, outliers and standardization of the data.
Firstly, it is important that the variables are chosen based on theoretical, conceptual and practical
considerations. Lazar (2012) makes a distinction between two selective methods: feature extraction
enables researchers to derive new and possibly more relevant features from the already existing features,
while with feature selection they solely choose the most relevant features.
Secondly, the dataset should be broad enough so that it represents all relevant categories and the
underlying structure can be studied. If the objective is to identify relatively large groups then a smaller
dataset will suffice, but if the objective is to identify small groups then the dataset must be large enough
to ensure that every group is included. Only then can the dataset be considered as representative.
Thirdly, there is always the possibility of outliers, and so researchers need to decide whether or
not to include these in the results. If a certain observation is not at all representative and could negatively
affect the hierarchy or help produce unrepresentative clusters, then that observation should be removed.
7
If the observation represents a smaller group which is considered irrelevant for that particular research,
then that observation should be removed as well in order to keep the focus of the resulting clusters on
groups that are relevant. If outliers do however represent a relevant group but are just a poor
representation because of their small number, then they should not be removed from the clustering
results.
Lastly, according to Lazar (2012, p. 33), “clustering variables that are not all of the same scale
should be standardized”. He describes some standardization techniques and makes a distinction between
variable standardization and sample standardization.
Furthermore, something that needs to be considered as well is how to measure the similarity between
individual cases. For this step there are several similarity measures at hand (see Section 2.2.1) and it is
common to draw up a similarity matrix for all cases. When the similarity has been determined, the next
step is to form clusters. It is evident that the most similar cases will be grouped into the same cluster.
However, that is not where the clustering ends. The clusters which are closest to each other are merged
as well, and this is done repeatedly. Another question that might arise is: how do we keep track of the
number of clusters that have eventually been formed? To determine this, researchers can measure the
homogeneity of each cluster by calculating the average distance between cases from the same cluster.
The clustering solution can then be visualized in a graph or tree diagram to make the structure even
clearer.
2.2.1 Similarity measures
According to Fisher and van Ness (1971, p. 92) “one of the first steps in clustering is to get some measure
of closeness between two observations”. Additionally, Blashfield and Aldenderfer (1988, p. 457)
describe similarity as being “fundamental to the process of classification. Objects perceived as similar
are often classified as being in the same group, whereas those perceived as different are placed in other
groups.”
Most of the clustering techniques mentioned later (see Section 2.2.2) rely on the calculation of similarity
between cases. Whereas in everyday life the concept of classification based on similarity comes quite
naturally, scientists need to find an objective way to process their data and measure similarity. In order
to do this, they turn to statistical approaches. There are many ways to determine the similarity between
objects, but most of the similarity measures use the concept of metrics, which means that the degree of
dissimilarity is represented as the distance between cases when they are projected as points in space.
This can for example be supported by creating a NxN similarity matrix, with N referring to the number
of cases being clustered. However, this is not the standard procedure, and it is important that researchers
base their choices on the design of their research. Below are given three similarity measures that are
8
relevant within the context of cluster analysis, as presented by Sneath and Sokal (1973), Blashfield and
Aldenderfer (1988), and Lazar (2012) among others.
A first way to measure object similarity is correlation coefficients. A correlation coefficient is a method
to determine the correlation between cases for certain variables. The coefficient takes a look at the
variables for each of the cases and assigns a value between -1 and +1. If the value is 0, then there is no
relation between the cases.
A second possibility is distance measures. Distance measures are also described as ‘dissimilarity
measures’, because a high distance value means that the two cases that were compared are quite
dissimilar. In contrast to the correlation coefficient mentioned above, a distance value of zero means
that the cases have the same values for the same variables. Something correlation coefficients and
distance measures have in common is that they both require metric data. There are several distance
measures which are often used, such as the Minkowski distances, the Euclidean distance, which is
actually a special form of the Minkowski distance formula, and Mahalanobis D2, also known as the
generalized distance. The latter also incorporates the correlations among variables.
A final category of similarity measures is that of association coefficients. As opposed to the two
previously mentioned similarity measures, association coefficients do not require metric data.
Finally, an important metric to mention is Dice’s coefficient (Dice, 1945). It is most commonly used to
measure the distance between Boolean vectors, which are vectors that contain no other values than 0
and 1. Additionally, it assigns a higher weight to double positives. With double positives, it is meant
that a case is present in both vectors, and so these are cases of mutual agreement.
2.2.2 Clustering techniques
Like with the similarity measures above, there are several possibilities when it comes to grouping data
and forming clusters. Summarised below are five well-known approaches for clustering, each of them
differing in how the cluster groups are formed (Blashfield, & Aldenderfer, 1988).
2.2.2.1 Hierarchical agglomerative (bottom-up)
This first technique is probably the most frequently used method for clustering. Following this
technique, clusters are combined based on the similarity between the types used for a certain study, until
all types are grouped into one cluster. The degree of similarity, which is needed to form and distinguish
the clusters, is determined by calculating a similarity matrix. It is useful to mention that opting for this
approach results in non-overlapping clusters, meaning that each type can only be part of one cluster.
However, a cluster can belong to a larger cluster, which creates the typical hierarchical aspect of this
9
technique. The clusters are often graphically displayed in a dendrogram, also known as a tree diagram,
to give a clear overview of the hierarchical relations. Another aspect of the hierarchical agglomerative
technique is the importance of linkage rules. Here again, there are different options to choose from,
which will be elaborated on in 2.2.3.
2.2.2.2 Hierarchical divisive (top-down)
The hierarchical divisive technique is often seen as the opposite of the hierarchical agglomerative one.
Contrary to the abovementioned technique, here all types firstly belong to one and the same cluster,
which is then divided into smaller parts. Even though both of these hierarchical methods are popular,
many researchers prefer the agglomerative technique over the divisive one because algorithms for
divisive strategies need more computing. Within this method, a distinction is made between monothetic
clusters and polythetic clusters, depending on the criteria to be part of a cluster. With the monothetic
strategy, cluster members are determined based on one (or more) specific variable(s). In order for a type
to be part of such a cluster, it needs to have a certain score for those specific variables. That is why the
monothetic divisive strategy is mostly used with binary data. In order to belong to a polythetic cluster,
however, there is not one single variable which is needed; it suffices for the type to have certain subsets
of the variables.
2.2.2.3 Iterative partitioning
With the iterative partitioning technique, a series of processes is followed. First the dataset is divided
into clusters. After the centroids of the clusters have been computed, each type or data point is allocated
to the closest centroid. This results in new clusters, and so new centroids can be computed. This process
is then repeated until each data point stays in the same cluster and no further shifts take place. What
differentiates this technique from some other techniques, is that it works with the data itself, and not the
similarities between data points. It also processes the data more than once. Useful to mention here is
that this method results in single-rank clusters, meaning that the clusters are not nested and so there is
no hierarchy. A well-known example of partitioning-based clustering is K-means clustering.
2.2.2.4 Density search
First of all to clarify, the term ‘density’ refers to the number of points within a certain space. When using
the method of density search, clusters are seen as a region with a high density of data points in relation
to the regions surrounding it. The purpose of this technique is to form new clusters instead of joining
new cases to already existing clusters. To do this, the distance is measured between an existing cluster
and a new case or cluster.
10
2.2.2.5 Factor analysis variants
The usual methods of factor analysis typically produce a NxN correlation matrix between variables.
However, variants of factor analysis which are used to form clusters produce a correlation matrix
between the cases, not between variables. From that correlation matrix, factors are then extracted, with
each of the factors resulting in a separate cluster. When a case belongs to a certain cluster, it means that
it has a high correlation to that corresponding factor.
2.2.3 Linking methods
When discussing the hierarchical agglomerative approach for clustering (see Section 2.3.1), we briefly
mentioned the use of linkage rules. These are essential when it comes to forming and linking clusters.
While there are many linkage methods, our focus will be on the four most common ones, namely average
linkage, complete linkage, single linkage and Ward’s method. Something all hierarchical agglomerative
methods have in common is that they first look for the two most similar cases in the similarity matrix.
After these two cases have been merged into a cluster, they do the same for the next two most similar
cases in the matrix, and so on. Important to mention here is that, after a case has been merged with
another one, each separate case in the matrix is replaced by the newly formed cluster they now belong
to. That is something all these methods have in common. Where agglomerative methods differ, however,
is the way in which they merge two clusters instead of two cases. This will be discussed below for the
four most popular linking methods as presented by Blashfield and Aldenderfer (1988). Each explanation
is supplemented with the definition of a cluster for that particular method, as well as some other
characteristics.
2.2.3.1 Single linkage (nearest neighbour)
When using single linkage to merge two clusters, the merging is based on a certain similarity between
at least one case from each cluster. This means that a case from the first cluster has one aspect in common
with a case from the second cluster, or in other words, that there is a single link between two cases from
different clusters. Blashfield and Aldenderfer (1988, p. 450) define this type of clusters as “a group of
entities such that every member of the cluster is more similar to at least one member of the same cluster
than it is to any member of another cluster”. The distance between the two clusters is then equal to the
shortest distance between a case from the first cluster and a case from the second cluster.
The biggest advantage of single linkage is that it is one of the few methods that will not be
affected if the data from the similarity matrix changes. Even though single linkage has this advantage
on most other hierarchical agglomerative linking methods and is one of the most commonly used
methods, the main problem with single linkage is that it tends to form long and thin cluster chains. As
11
the clustering process continues, that long chain will gradually add new cases to the cluster. This can
create meaningless outcomes, for example when the data is divided into only two clusters, with one
cluster containing only one single case and the other cluster containing all the other cases.
2.2.3.2 Complete linkage (furthest neighbour)
This method can be seen as the opposite of the single linkage rule. Whereas single linkage only needs
one aspect of similarity between two cases from different clusters, complete linkage demands all cases
from both clusters to be similar in order to merge those clusters. The high level of similarity implies that
a cluster can be defined as “a group of entities in which each member is more similar to all members of
the same cluster than it is to all members of any other cluster” (Blashfield, & Aldenderfer, 1988, p. 451)
and so the resulting clusters will be relatively compact compared to those of other linking methods.
Complete linkage is said to be a space-diluting method, which means that when clusters are being
merged, they contract and leave more space in between them. This results in smaller and more separated
clusters of approximately the same size. The distance between two clusters is then equal to the greatest
distance between a case from the first cluster and a case from the second cluster. A disadvantage of
complete linkage is that it tends to produce spherical clusters.
2.2.3.3 Average linkage
The average linking rule is some sort of compromise between the single and complete linking methods.
Sneath and Sokal (1973) labelled single linkage too liberal because only a small level of similarity was
required for the clusters to be merged, while labelling complete linkage too conservative because of the
high requirements of similarity. As the name already implies, the average linking method relies on an
average value of similarity between all cases from one cluster and all cases from another cluster. If a
certain level of similarity is reached, then the two clusters are merged. According to Blashfield and
Aldenderfer (1988, p. 452), “this method defines a cluster as a group of entities in which each member
has a greater mean similarity with all members of the same cluster than it does with all members of any
other cluster”.
Contrary to complete linkage, the average linkage method is said to be space-conserving. This
means that the clusters do not contract nor expand, but instead maintain the original distance between
objects in that space. For this method, the distance between two clusters is equal to the average distance
between a case from the first cluster and a case from the second cluster. The main advantages of average
linkage are that it is less affected by outliers than some of the other linking methods are, and that it
usually comes close to recovering known structures in the data. A disadvantage though is that it tends
to generate clusters with approximately the same amount of variance within clusters.
12
Average linking has also inspired some variations on this method, such as centroid clustering,
median clustering and the weighted average method. With centroid linkage, the distance between two
clusters is the distance between the centres of those two clusters. For median clustering, the idea of
calculating the mean distance between two clusters is mostly the same. But instead of calculating the
average distance, this linkage method takes the median distance between a case in one cluster and a case
in another cluster. Finally weighted clustering assigns a valued weight to the distances between clusters
and cases, which represents the size of the cluster. These weights are not determined by the researchers,
instead they result from the algorithm. The main difference between normal and weighted average
linking is that the unweighted method calculates the proportionate average, while the weighted method
simply calculates the average without taking any proportions into account. By not considering the
proportions, that is where the assigned weights originate from.
2.2.3.4 Ward’s method
The objective of Ward’s method is to minimize the variance within clusters (Ward, 1963). Ward’s
method introduces the concept of ‘the error sum of squares’ (ESS). This value equals zero when all cases
are still in their original cluster, and increases when clusters are merged. For Ward’s method, the distance
between two clusters is equal to how much the sum of squares increases. When two clusters show the
lowest increase in ESS, those clusters are merged. The clusters are usually fairly equal in size. So for
this method, a cluster can be defined as “a group of entities in which the variance among the members
is relatively small” (Blashfield, & Aldenderfer, 1988, p. 452). A great advantage of this method is that
it can find known structures in the data. Just like complete linkage, Ward’s method is space-diluting as
well.
2.2.4 Criteria
Researchers have several ways of choosing which clustering method is the best for their particular
research. One way to compare clustering methods is to subject them to different criteria. In their study,
Fisher and van Ness (1971) compared a variety of conditions for five different clustering methods.
Figure 1 shows the results and suggests that single linkage satisfied the most conditions. Described
below are the nine admissibility conditions or criteria for evaluating clustering methods as presented by
Fisher and van Ness (1971).
o connected admissible: When all cases from the same cluster are connected by lines, the
clustering method should not produce clusters in which lines from other clusters intersect.
o image admissible: It should not be possible for the data to be clustered in any other way which
is considered better than the original clustering. This means that with another method the
13
differences within clusters should not be larger compared to the original clustering and
differences between clusters should not be smaller.
o convex admissible: The convex hulls of different clusters should not intersect.
o well-structured admissible: The data should be structured in such a way that the clustering
becomes clear. This criteria is further divided into two separate admissibility conditions:
▪ exact tree: The distance matrix can be reconstructed by only consulting the hierarchical
tree structure.
▪ k-group: All distances within the cluster are smaller than all distances between different
clusters.
o point proportion admissible: When part of a dataset is duplicated and added to a modified dataset
consisting of the original dataset plus the duplicated part, then the results for the modified
dataset should remain the same as those for the original dataset, and so the boundaries of the
clusters should not change.
o cluster proportion admissible: When duplicating each cluster, the clustering method should
produce clusters with the same boundaries.
o cluster omission admissible: When the dataset has been clustered and then one cluster is
removed from the original dataset, the clustering method should produce the same clusters from
the modified dataset as from the original dataset.
o monotone admissible (= monotonic invariant): When an element in the similarity or distance
matrix is changed, the clusters should stay the same.
Figure 1: Comparison of five clustering methods for nine criteria (Fisher & van Ness, 1971, p. 94)
14
2.2.5 Validation
After the cluster analysis has been conducted, there are several possibilities to validate the results. Lazar
(2012) proposes some indices for cluster validation, namely an external, internal and relative index.
Following the external index, the cluster labels are compared to already existing labels provided by
experts. The internal index evaluates the data on its own and does not match the results with external
information. Lastly, the relative index is used to compare various clustering methods and their cluster
solutions. This can be done with both internal and external indices.
15
3 METHODOLOGY
There are several parts to this research. A first substantial part consisted of an annotation task
(Section 3.1). This was then used as input for the subsequent and most important parts of the research,
namely the frequency analysis (Section 3.2) and cluster analysis (Section 3.3). Below is a detailed
description of the entire process. The main research question we wish to answer is “Is it possible to
deduce a label set from experimental cluster analysis?”. Our hypothesis for that research question is that
it is indeed possible to provide a label set based on emotion clusters, seeing that both Shaver et al. (1987)
as well as De Bruyne et al. (2019) have already succeeded in doing so (see Section 2.1.2.4). Furthermore,
we think the subquestions about topic and domain will have an affirmative answer as well, as domain
adaptation is a frequently appearing problem in the field of NLP (Daumé, & Marcu, 2006; Glorot,
Bordes, & Bengio, 2011).
For the first part, a total of nine episodes was selected from three different Flemish reality tv series. The
series in question are Bloed, Zweet en Luxeproblemen (BZL), Blind Getrouwd (BG), and Ooit Vrij (OV)
and they were selected based on their emotional content. Additionally, each of these series was selected
because they had a different topic, which makes them suitable for a study on domain adaptation. Bloed,
Zweet en Luxeproblemen shows six privileged youngsters being faced with problematic issues in third
world countries; Blind Getrouwd follows people who were chosen to marry their perfect match without
having ever met them before; Ooit Vrij is a documentary series on prisoners in Belgium and their journey
to being released. For each series we selected the first and last episodes, as well as an episode in the
middle to have a good general overview of its content. The nine episodes were either downloaded
beforehand, or could be consulted on the online video player of the providing TV channel. It is important
to mention that even though the video files were already subtitled for certain parts, those subtitles were
not taken into consideration but transcribed manually by a student worker. The transcriber did not
shorten or correct the spoken language in any way and every sentence was transcribed exactly the way
it was said1.
After the episodes were transcribed, 1000 utterances, so approximately 333 utterances per series,
were selected for further research2. For the research presented here, a subset of 450 utterances was
selected from the initial dataset: 145 utterances from BZL, 155 from BG and 150 from OV.
1 The transcriptions are available on request. Please contact [email protected]. 2 For more information on the PhD research in question, see https://research.flw.ugent.be/en/projects/emotionl-
emotion-detection-dutch.
16
3.1 Annotation
The 450 utterances were first prepared for manual annotation using an emotion set based on
psychological research (see Section 2.1.2.4) containing 25 emotion categories: anger, contentment,
disappointment, disgust, enthrallment, enthusiasm, envy, fear, frustration, irritation, joy, longing, love,
lust, nervousness, optimism, pity, pride, rejection, relief, remorse, sadness, suffering, surprise, and
torment.
The transcriptions were placed in separate Excel documents per TV series, which were designed
as follows (see Appendix 1): The top row of the file contained all utterances next to each other in
different columns. Below that, the first column of the annotation files contained 25 rows for the emotion
categories and several subcategories (Figure 2). There was also an extra row for any further comments,
such as the presence of irony. All utterance columns were labelled using binary 0|1 annotations for all
of the 25 emotion categories, depending on whether or not the emotion was present in that utterance.
The annotation task was not limited to one label per utterance, so that multiple emotion labels could be
assigned. Not only emotion words were taken into account, but also the overall feeling of the utterances.
To do this, the annotator was asked to perform the labelling task from a speaker perspective, meaning
that they took the point of view of the speaker to judge their feelings at that moment. To clarify certain
features of the task, we will present and comment on some examples in the following paragraphs.
Figure 2: 25 emotion categories with their subcategories
The first utterance contains an example of an emotion word, namely ‘content’ (equivalent to the English
adjective ‘content’ meaning ‘glad’ or ‘pleased’). This might indicate the presence of contentment or joy.
Other possible emotion words include ‘blij’, ‘prachtig’, ‘grappig’, ‘leuk’, ‘gelukkig’, ‘benieuwd’,
‘gefrustreerd’, ‘boos’, ‘gechoqueerd’, ‘verschrikkelijk’, ‘teleurgesteld’, ‘nerveus’ (English translations:
17
‘joyful’, ‘beautiful’, ‘funny’, ‘nice’, ‘happy’, ‘curious’, ‘frustrated’, ‘angry’, ‘shocked’, ‘horrible’,
‘disappointed’, ‘nervous’) and so on.
“Ah dag Aagje. Content dak u zie.”
The next example from the BG dataset (only the bold sentences are meant to be annotated) shows an
utterance that was labelled with multiple emotion categories. The speaker was told he was about to go
skydiving, which came as an unexpected surprise to him. He felt scared and nervous about the news
because it is totally out of his comfort zone, but also seemed excited to try something adventurous. That
is why these sentences in bold were labelled with the four emotions of surprise, fear, nervousness and
enthusiasm.
“What the fuck… Wa gaan wij doen? - Ge ziet daar een vliegtuig staan eh. - Wij gaan vliegen
jongen. - En gij ga springen. - Nee nee nee nee nee. Zijde ant zwanzen ofwa?”
The utterance below was taken from the dataset for OV. It contains an example of when irony was
indicated in the comment section of the annotation file. At the same time, it helps to clarify what is
meant by ‘annotated from a speaker perspective’. The first part of the excerpt (the preceding sentence)
was outed by someone who was teasing her colleague about being bald, while the second part, the ironic
utterance, is the answer of said colleague. While the first person enjoyed the situation and found it quite
funny, her colleague most likely did not. This illustrates the difference in perspective: from the
perspective of an outsider, the teasing is seen as positive, but from the speaker’s perspective, the event
is actually perceived in a negative way and so the utterance in bold should be annotated for negative
emotion categories. Even though the speaker used a positive emotion word, namely ‘geestig’ (a Flemish
word meaning ‘funny’), that word was used in a sarcastic way, making the utterance rather negative.
This ironic use of ‘geestig’ also demonstrates that not only emotion words should be taken into account,
but rather the entire context of the utterance.
“Kan kik jou ook nie verplichten vo jon haar te laten groeien eh. - Zeer geestig wih.”
3.2 Frequency analysis
The annotation results for the three TV series were first compared on the frequency of the emotion
categories. Using the formulas provided by Excel, we calculated absolute frequencies, as well as relative
frequencies in percentages. These frequencies, along with their emotion categories, were then ordered
from high to low. All results were combined in one table to give a clear overview and facilitate the
comparison between the different series. Besides this, all numbers were also added and the frequencies
18
were recalculated to show what the results would be if no distinction was made between the different
series and topics. This will be useful when it comes to studying the issue of domain adaptation. All this
data was also converted into graphs for visual clarification.
For the frequency analysis, we first studied the overview that shows a comparison of the three
series per emotion category, looking for any striking peaks or patterns. Then we studied the top 3, top 5
and top 10 of most frequent emotions per topic. More specifically, we took a closer look at how many
of these emotions were positive or negative, and how many belonged to the emotion model proposed by
Ekman (1992). Additionally, the number and positive/negative polarity of emotions that occurred less
than ten times were compared, as well as the emotions that did not occur at all for a certain series, if
any. The reason for examining the spread of Ekman’s emotions, is to conclude whether such basic
emotion frameworks would be suitable for emotion analysis. The results for the three series were then
compared to the combined results to study the difference between topical emotion frequencies and
general emotion frequencies. When analysing the combined frequencies, we referred back to the results
per series and examined the positions in the topical ranking in order to find an explanation for their
position in the general ranking.
3.3 Cluster analysis
The next part of our research was the cluster analysis. There are several steps leading up to the actual
cluster analysis. To conduct a cluster analysis, clusters are needed. In order to distinguish different
clusters, first it is necessary to form tree diagrams, also known as dendrograms. These dendrograms
depict the relations between the emotion categories, so we started by calculating the Dice dissimilarity.
As explained in section 2.2.1, the Dice dissimilarity is calculated to measure the similarity of Boolean
data. For our study, this was done by taking the binary 0|1 annotations and using the resulting Dice
dissimilarity between emotion pairs to draw up a 25x25 distance matrix, which gave us a first insight on
which emotion categories are most similar. This data was then used as input for a hierarchical clustering
algorithm. For each TV series, we generated dendrograms using the seven different linkage methods
described in section 2.2.3: average, centroid, complete, median, single, weighted linkage and Ward’s
method. Dendrograms were also generated for the overall results of the three series combined. After
comparing all linkage methods and using the process of elimination, Ward’s method was chosen to be
the most suitable one.
The next step was to decide on a cut-off value and define the clusters. After trying out and
considering several options, 1.6 was found to be a suitable cut-off value, as it produced an acceptable
number of clusters with a clear structure. If the cut-off line happened to coincide with a horizontal
linking line between members or clusters, we decided to disregard that link and split the clusters. Each
resulting cluster was assigned a different colour to create a clear overview of the separate clusters. For
the umbrella term per cluster, it was decided to adopt the Ekman emotion if one was part of that
19
particular cluster. If not, we selected the emotion category with the highest frequency. In case of multiple
Ekman emotions within one cluster, the two approaches were combined, meaning that the Ekman
emotion with the highest frequency was selected. For the first cluster analysis, the number of clusters
per TV series was compared and the clusters themselves were studied on their members and polarity, as
well as on their resemblance to Ekman’s basic emotions.
However, as one of our objectives is to examine topical emotion frequencies, we expected some
emotions to be represented more than others when processing the frequency scores for the different TV
series. Following this hypothesis, we generated three new dendrograms (one per TV series) using
Ward’s method once again, but this time excluding the emotion categories which occurred less than ten
times. This threshold of ten was acquired from a similar cluster analysis performed by De Bruyne et al.
(2019). The same cut-off value of 1.6 was maintained to differentiate between the new clusters. In the
new dendrograms, each adapted cluster was assigned the same colour as their corresponding original
cluster. That way, it would immediately become clear which clusters disappeared and which members
shifted or were removed from the original clusters.
The adapted dendrograms and more importantly the clusters were then compared to the original
hierarchical structures. For the second cluster analysis, we examined the changes within clusters, as well
as the modified linking between categories or clusters, if any. Similar to the first cluster analysis, the
distribution of positive and negative emotion clusters was studied as well. Finally, we also took a closer
look at the resemblance between our final cluster labels and Ekman’s basic emotions.
20
4 RESULTS
4.1 Frequency analysis
This section presents the results from the frequency analysis. First the frequency scores from the three
TV series are presented next to each other in a general overview, and we will give some first impressions.
Then we will take a closer look at the frequency graphs per TV series, where the specific scores are
mentioned and the emotion categories are ranked from high to low. For each graph, the top 3, top 5 and
top 10 of most frequent emotions are examined, as well as the emotions that occur less than ten times
and the spread of the Ekman emotions. We also comment on the polarity of the emotion categories,
which is indicated in colour next to the emotion label. After the analysis per TV series, the same aspects
are studied for the combined frequency scores, of which the results are presented in the final part of this
section.
Figure 3: Frequencies of emotion categories compared per TV series
21
Figure 3 gives an overview of the number of annotations per TV series for each of the 25 emotion labels.
When examining the emotion model, it was concluded that it consists of ten positive emotions (marked
green) and fourteen negative emotions (marked red). Surprise was considered a neutral emotion (marked
blue) until further notice, because it can be either positive or negative depending on the context. Though
considering the content of the different TV series, we expected surprise to have a positive connotation
in Blind Getrouwd (BG), and a negative connotation in both Bloed, Zweet en Luxeproblemen (BZL) and
Ooit Vrij (OV). This hypothesis in relation to the different polarities was based on the topic summaries
of the TV series, as well as the occurring events that were shown throughout the episodes.
A first look at the graph tells us that the frequency scores for BZL and OV are often the opposite
of the results for BG: when BZL and OV have a high frequency for a certain emotion, the frequency for
that emotion is often low for BG, and vice versa. Take for example the results for joy and frustration.
BG scores almost twice as high (57) as BZL (23) and OV (29) for joy, while its frequency for frustration
(9) is not even a quarter of the frequencies for BZL (47) and OV (53). This clear contradiction between
relatively high frequencies for BZL and OV and a lower frequency for BG, or vice versa, was the case
for more than half of the emotion categories. The frequency results per TV series will be discussed in
more detail in the following paragraphs.
Figure 4: Frequencies of emotion categories for Bloed, Zweet en Luxeproblemen
22
Figure 4 shows the frequency scores for BZL, ranked from high to low. The top 3 consists of sadness,
frustration and disgust, which are all negative emotions. With the addition of irritation and pity, even
the top 5 consists of all negative emotions. When looking at the top 10 of emotions, two positive
emotions are introduced, as well as the temporarily neutral emotion surprise. On the other end of the
frequency scale, it is clear that there are eight emotions which appear less than ten times (lust, envy,
longing, optimism, pride, relief, enthrallment and love). Only one of these emotions is negative, while
the other seven are positive emotions. Among these is lust, which was not even indicated at all in the
dataset from BZL. Interesting to mention here is that none of the Ekman emotions3 appear less than ten
times. Two of the Ekman emotions, namely sadness and disgust, are even part of the top 3 most frequent
emotions, with a frequency of 55 and 46, respectively.
Figure 5: Frequencies of emotion categories for Blind Getrouwd
Contrary to the results for BZL, the top 3 most frequent emotions for BG does not consist of only
negative emotions. In Figure 5 above it can be seen that the 2 most frequent emotions are positive,
namely contentment and joy, with the negative emotion nervousness closing the top 3. The results for
3 As mentioned in our literature study (see Section 2.1), the Ekman emotions are anger, disgust, fear, joy,
sadness and surprise.
23
BG continue in this positive direction, as the emotions that complete the top 5 are enthusiasm and
optimism, which are both positive again. However, the top 10 shows a more balanced outcome. Five of
the emotions are positive, four are negative, and the top 10 is closed by the neutral surprise. When
looking at the lower frequency scores, we notice that there are many emotions which appear less than
ten times. More specifically, there are 11 emotions of that kind, which is almost half of the emotion
categories. Nine of these emotions are negative and only two are positive. Surprisingly, the emotion
with a frequency score of zero happens to be part of Ekman’s emotion set. Whereas disgust was the third
most frequent emotion for BZL, it was never indicated for BG. The second least frequent emotion is an
Ekman emotion as well, namely anger. The rest of Ekman’s basic six were labelled more than ten times,
although joy is the only one that made it to the top 5 and top 3 of most frequent emotions with a frequency
of 57.
Figure 6: Frequencies of emotion categories for Ooit Vrij
Figure 6 shows the results for the final TV series, OV. Similarly to BZL, the top 3 for this dataset only
contains negative emotions, namely frustration, sadness and irritation. This remains the same for the
top 5, with the addition of disappointment and nervousness. The top 10 most frequent emotions shows
the first positive emotions, even though the majority still remains negative: seven negative emotions
24
compared to only three positive emotions. Six emotions appeared less than ten times, which is a lower
number compared to BZL and BG. Two of those six emotions, one positive and one negative, were
never indicated. Four of Ekman’s emotions (sadness, anger, joy and fear) can be found in the top 10,
but only one in the top 5. That emotion is sadness and is again part of the top 3, which was also the case
for BZL, but this time it is only the second most frequent emotion.
Figure 7: Frequencies of emotion categories for the three TV series combined
The last part of the frequency analysis adds all results together without making a distinction between
the different topics from the dataset as was done in Figure 3. Figure 7 above shows the frequency scores
for all three TV series combined. We can see that the top 3 most frequent emotions consists of two
negative and one positive emotion, respectively sadness, nervousness and joy. Sadness already appeared
in the top 3 of two previously discussed TV series. It was the most frequent emotion for BZL and the
second most frequent emotion for OV. Nervousness also appeared in one of the top 3’s, as it was the
third most frequent emotion for BG. It also appeared in the top 5 for OV and the top 10 for BZL. The
last emotion in the top 3 shown above is joy, which was the second most frequent emotion for BG.
Besides that, it was also part of the top 10 for both BZL and OV.
25
Frustration and contentment are the emotions that complete the top 5 in this combined graph,
setting the balance at three negative and two positive emotions. Interestingly, those two emotions to
complete the top 5 both had the highest frequency score for one of the separate datasets: frustration was
the most frequent emotion for OV and contentment for BG. Frustration was even the second most
frequent emotion for BZL, but remarkably appeared less than ten times in BG. Contentment on the other
hand appeared in the top 10 for both BZL and OV.
The top 10 shows a relatively equal distribution of positive and negative emotions: five negative
emotions, four positive, and again the neutral emotion surprise. When looking at the spread of the
Ekman emotions, we see that only three made it to the top 10 (sadness, joy and surprise), two of which
appear in the top 3 (sadness and joy).
4.2 Cluster analysis
The following paragraphs elaborate on the results from the cluster analysis. As already mentioned in
section 3.3, the hierarchical structures presented below were generated using Ward’s linkage method
(see Section 2.2.3 about cluster linking). Each section is divided into two parts: an initial cluster analysis
and an adapted cluster analysis. The dendrograms are studied on the number and polarity of clusters and
cluster members, as well as the spread of Ekman emotions. The transformation between the two different
cluster analyses is also examined. Lastly the final label set is compared to Ekman’s basic emotions set.
Results are presented for each of the TV series, as well as for the overall dataset.
4.2.1 Bloed, Zweet en Luxeproblemen
Figure 8a: Initial dendrogram for BZL
26
Figure 8a depicts the dendrogram for Bloed, Zweet en Luxeproblemen using Ward’s linkage method and
applying a distance of 1.6 as the cut-off value. It defines eight different clusters: joy, relief, love, anger,
sadness, remorse, suffering and fear. Of these eight clusters, four have been assigned an emotion label
related to Ekman’s basic emotions (joy, anger, sadness, and fear). When looking at the structure of the
dendrogram, we can clearly see a distinction between two groups based on their polarity. The first three
clusters contain positive emotions, while the last five clusters contain mostly negative emotions.
Compared to Ekman’s set, which only has joy as a positive emotion, these first results are more equally
divided in positive and negative emotion clusters.
A striking observation here is that the fifth cluster contains no less than three of Ekman’s
emotions, namely surprise, disgust and sadness. Another interesting observation is that longing, a
positive emotion, is clustered together with remorse and envy, two negative emotions.
Figure 8b: Adapted dendrogram for BZL without infrequent emotions
Figure 8b shows the adapted dendrogram after excluding the emotion categories that were indicated less
than ten times. As expected, there are less clusters compared to the original hierarchical structure, but
they are very similar. Cluster 1 contains enthusiasm, contentment and joy; cluster 2 consists of rejection,
disappointment, anger, frustration and irritation, cluster 3 comprises surprise, disgust, pity and sadness;
suffering and torment form cluster 4. These four clusters have remained the same. Cluster 5 has remorse,
fear and nervousness as its members, which shows a new link between the previously consisting cluster
of fear and nervousness with the newly added remorse. When looking back at Figure 8a, we can see that
27
remorse used to be grouped together with envy and longing. As the latter two were removed, remorse
shifted to one of the other clusters. This new dendrogram results in five clusters and a final label set with
joy, anger, sadness, suffering and fear.
Furthermore, as the one original cluster that contained both negative emotions and a positive
emotion has been split and partially removed, all clusters now consist of either only positive or only
negative members (with the exception of surprise). However, compared to the first cluster analysis for
BZL, it is clear that the distribution between positive and negative emotions is now less equal than
before: one positive cluster against four negative clusters. The only negative emotion label that
disappeared from the original cluster labels is remorse, but the one remaining member of that cluster is
now part of the fear cluster. Two of the positive labels have disappeared, which leaves only joy. This is
very similar to the distribution in Ekman’s emotion set. Overall, our final label set shows a fair
resemblance to Ekman’s basic emotions, with a total of four shared emotion labels (joy, anger, sadness
and fear).
4.2.2 Blind Getrouwd
Figure 9a: Initial dendrogram for BG
Figure 9a shows the dendrogram for Blind Getrouwd, with the same linkage method and cut-off value
used for BZL. The first cluster analysis results once again in eight separate clusters: sadness, anger, joy,
suffering, surprise, pity, longing and love. Similar to the results for BZL, four out of the eight clusters
have been assigned an emotion label from Ekman’s basic emotions (sadness, anger, joy and surprise).
When examining the polarity of the clusters, we can see a mixture of three positive labels, four negative
28
labels and the temporarily neutral surprise. However, as the surprise cluster further contains all negative
emotions, we will consider this entire cluster to be negative as well, regardless of the neutral umbrella
term. This makes up for a total of three positive cluster labels and five negative cluster labels. Yet again,
this is a fairly equal distribution in ways of polarity, and definitely more equal than within Ekman’s
emotion set.
In addition, some further interesting observations are to be found in this dendrogram. There is
one cluster which again, similarly to the results for BZL, contains more than one Ekman emotion,
namely the third cluster with both surprise and fear. Additionally, longing and envy are once more
clustered together, even though they are respectively a positive and a negative emotion. This was also
the case with the initial cluster analysis for BZL.
Figure 9b: Adapted dendrogram for BG without infrequent emotions
Figure 9b shows the adapted dendrogram for the second cluster analysis. What immediately becomes
clear, is that there are remarkably less clusters. Due to the many emotion categories that were indicated
less than ten times, the number of clusters was halved. The clusters that remain, however, are very
similar to the original clusters. The sadness, surprise and joy clusters have stayed intact, while the love
cluster has both lost members and gained a new member. After envy was removed, instead of forming a
cluster on its own, longing joined the positive love cluster. This gives us a total of four clusters: cluster
1, labelled sadness, consists of rejection, disappointment and sadness; cluster 2 labelled surprise,
contains surprise, fear and nervousness; cluster 3, labelled joy, comprises enthusiasm, contentment, joy,
optimism and pride; and finally cluster 4, labelled love, covers enthrallment and love as well as longing.
29
When judging on polarity, we see a balance of two negative and two positive emotion clusters,
respectively sadness and surprise in contrast with joy and love. This balance is undoubtedly more equal
than Ekman’s distribution. Comparing the final labels to Ekman’s basic emotions, it can be observed
that our clusters only have three labels in common with Ekman, namely sadness, surprise and joy.
Surprisingly, one of the clusters that disappeared is anger, which happens to be an Ekman emotion. But
then again, that particular emotion category was only indicated once for BG, and the other members of
the cluster also occurred less than ten times. Disgust, another Ekman emotion, was never even indicated
and was therefore excluded from both cluster analyses.
4.2.3 Ooit Vrij
Figure 10a: Initial dendrogram for OV
Figure 10a plots the dendrogram for Ooit Vrij, again adopting the same linkage method and cut-off value
as for the previously discussed structures. Similar to the other TV series, the first cluster analysis for
OV results in eight different clusters: joy, optimism, anger, disgust, sadness, fear, lust and surprise.
Remarkably, all six of Ekman’s basic emotions appear in different clusters, meaning that six of the
clusters are labelled with one of Ekman’s emotions. When looking at the polarity of the clusters, we
again see a mixture of positive and negative emotions. There are three positive emotion clusters, four
negative clusters and one labelled with the temporarily neutral surprise. Given the fact that the emotion
category surprise is only clustered together with pity, a negative emotion, we will consider this cluster
to be negative as well. This results in a final five negative clusters against three positive clusters, which
is still more equally distributed than Ekman’s emotion set. As this has also been the case for the previous
30
topics, it could mean that a first unedited cluster analysis would typically produce more equally
distributed clusters in the sense of polarity.
Contrary to the first cluster analyses of BZL and BG, Figure 10a shows that all negative
emotions are grouped exclusively with negative emotions, and the same applies to all positive emotions.
However, the clustering results for OV show the first cluster in this research that consists of only one
emotion category, which in this case is lust.
Figure 10b: Adapted dendrogram for OV without infrequent emotions
Figure 10b shows the last adapted dendrogram generated for domain-specific cluster analysis. As
opposed to the other adapted dendrograms, this one shows a rather high number of clusters, seven to be
precise. At first glance, it seems as if not much has changed, because some of the structures have
remained intact. The only cluster which completely disappeared is the cluster that consisted of only one
member, lust. This results in a final label set with joy, optimism, anger, sadness, disgust, fear and
surprise, which means that all six of Ekman’s emotion labels were preserved. Cluster one is formed by
contentment, enthusiasm and joy; cluster 2 consists of love, longing and optimism; cluster 3 covers
anger, frustration and irritation; cluster 4 groups together rejection and sadness; cluster 5 contains
disgust, suffering and torment; cluster 6 comprises fear and nervousness; and lastly cluster 7 compiles
surprise, disappointment and remorse.
A striking observation is that one of the original clusters, namely the sadness cluster, has been
split into two parts, each with two members. Rejection and sadness stay linked and remain in the sadness
cluster, while disappointment and remorse are now clustered together with surprise.
31
When we assess the polarity of the clusters, it is obvious that not much has changed. The one
cluster that disappeared was a positive emotion cluster, so now the distribution is five negative clusters
against only two positive clusters instead of three. Based on the distribution of polarity and the emotion
labels of the clusters, these results show a great resemblance to Ekman’s basic emotions.
4.2.4 Combined data
As already mentioned in our methodology, the three TV series for our research were chosen because of
their different topics. As part of this research, we also wanted to investigate whether emotion clusters
resulting from experimental cluster analysis are dependent on the topic of the data. To answer this sub-
question about domain adaptation, we combined the annotation results from the three TV series without
differentiating between the topics and used the complete dataset as input for a separate dendrogram. The
results from the cluster analysis are described in the following paragraphs.
Figure 11a: Initial dendrogram for the three TV series combined
Just like with BZL, BG and OV, we selected the dendrogram generated with Ward’s method and decided
on a cut-off value of 1.6. As can be seen in Figure 11a, this results in eight different clusters, which was
also the case for the topical cluster analyses. Cluster 1 contains enthusiasm, contentment and joy, cluster
2 lust, envy and longing, cluster 3 relief, enthrallment, love, optimism and pride, cluster 4 anger,
frustration and irritation, cluster 5 rejection, disappointment and sadness, cluster 6 remorse, surprise,
disgust and pity, cluster 7 suffering and torment, and finally cluster 8 contains fear and nervousness. The
32
preliminary label set for the combined data now consists of joy, longing, optimism, anger, sadness,
surprise, suffering and fear, which shows a fair resemblance to Ekman’s emotion set.
Figure 11a also clearly shows two distinct groups within the hierarchical structure. The first
three clusters are positive emotion clusters, while the last five clusters are negative emotion clusters.
Even though cluster 6 is labelled with the neutral surprise, it can be considered a negative cluster because
all other emotions in that cluster are negative. Compared to Ekman’s set, these results are more equally
divided in ways of polarity. Another interesting remark about this cluster is that it contains two of
Ekman’s basic emotions, namely surprise and disgust.
When looking at the members within the separate clusters, we can see that all members are
either exclusively positive or negative, except for the members of the second cluster. In this positive
cluster, lust and longing, two positive emotions, are grouped together with envy, a negative emotion. It
is now the third time that such observation has been made, as this already occurred within two of the
topical cluster analyses, both times also involving longing and envy. This could possibly mean that these
two emotions often occur together, or that longing tends to be clustered together with negative emotions,
but further thoughts on this and the rest of the results will be discussed more elaborately in the
discussion.
Figure 11b: Adapted dendrogram for the combined data without infrequent emotions
Figure 11b shows the adapted dendrogram for the dataset of all three TV series combined. As opposed
to the previous adapted dendrograms, for this dendrogram the emotion categories were removed which
33
occurred less than thirty times, instead of ten times. Considering the fact that the combined dataset is
approximately three times the size as the dataset per TV series, the threshold was tripled as well.
Similarly to the results for the domain-specific cluster analysis, the adapted dendrogram shows
fewer clusters than before. There are now only six clusters compared to the original eight. However,
when we take a closer look at the composition of the clusters, we see that none of the original clusters
completely disappeared. All clusters are still represented by at least one cluster member. Four of the
original clusters stayed intact, one was joined by a member of another cluster, and another one lost some
original members while gaining a new one. This results in these final six clusters: cluster 1 remained
intact with enthusiasm, contentment and joy; cluster 2 still contains love and optimism and was joined
by longing; cluster 3 also stayed intact with anger, frustration and irritation, cluster 4 remained the
same with fear and nervousness, but just shifted to another position in the hierarchical structure; cluster
5 still consists of the original members rejection, disappointment and sadness, and gained a new
member, namely suffering; and lastly cluster 6 is an original cluster as well, composed by remorse,
surprise, disgust and pity. This results in a final label set with joy, optimism, anger, fear, sadness and
surprise. With only one deviating emotion label (optimism instead of disgust, which is part of the
surprise cluster), this shows a great resemblance to Ekman’s basic emotions set.
In ways of polarity, we can see that the positive emotions are still outnumbered in the final label
set. As the surprise cluster has not changed and so can be considered a negative cluster, this leaves four
negative emotion labels (anger, fear, sadness and surprise) against only two positive emotion labels
(joy and optimism). Compared to Ekman’s set, though, this is already an improvement. Just like in the
initial dendrogram for the combined data, these two polarities are again clearly separated in the
hierarchical structure. Contrary to the clusters in Figure 11a, all clusters now consist of all positive or
all negative emotions. The orange cluster, which contained members of different polarities, has been
split. Two of the members were removed when applying the threshold of thirty, and the one remaining
member longing, a positive emotion, was grouped together with the remaining members of the positive
optimism cluster.
34
5 DISCUSSION
In this part of the paper, we will discuss the results from our research, compare it to previous research,
and comment on certain characteristics. First of all, the results will be examined in more detail. The
most striking observations are selected for further analysis and will be presented per subject. We will
attempt to give an explanation for certain observations by linking the frequency analysis to the different
cluster analyses. Some other aspects of the data, such as topic and polarity, will be taken into account
as well. Furthermore, we will compare our outcome to the results from a similar research performed on
Dutch tweets (De Bruyne et al., 2019) and discuss some interesting similarities and differences in terms
of clustering, polarity and of course the final label set. Finally, we will briefly discuss the validity and
reliability of our research, as well as comment on the added value of this research in the field of natural
language processing (NLP).
5.1 Analysis of the results
A first interesting observation was made when analysing the frequency overview. The results showed
that BZL and OV often score rather high for a certain emotion while BG will score relatively low for
that same emotion category, or vice versa. This can be explained by the difference in topics. Even though
all three TV series showed both positive and negative emotions, during the transcription and annotation
task it appeared that the majority of the events shown in BZL and OV were predominantly negative,
while those in BG were found to be predominantly positive. We had expected to see an impact of this
in the results, and this expectation was met quite early in our analysis. This matter of different
predominant polarities is also reflected in the frequency tables per TV series: the top 3 and top 5 most
frequent emotions for BZL and OV contained solely negative emotions, whereas the top 3 and top 5 for
BG only contained a single negative emotion. The same goes for the emotion categories that were
indicated less than ten times, which were mostly positive for BZL and OV and mostly negative for BG.
When comparing the cluster analyses for the three TV series, we can see that the first cluster analysis
always resulted in a total of eight emotion clusters. Even the initial dendrogram for the combined data
revealed eight clusters. This is presumably just a coincidence related to the linkage method and cut-off
value that were selected. Even if we had chosen Ward’s method but decided on a different cut-off value,
it is very likely that the number of clusters would have varied. Besides that, a more relevant observation
regarding the clusters is that the first cluster analysis always resulted in clusters that were fairly equally
divided in terms of polarity. In fact, the initial distribution is three positive emotion clusters and five
negative emotion clusters for all three TV series and even for the combined data as well. When analysing
the distribution in polarity, we have to keep in mind that the emotion model used for this research
contains more negative than positive emotions (respectively fourteen against ten), and so it is more likely
35
that there would be fewer positive clusters than negative clusters. Logically though, this would only be
the case if positive emotions are primarily grouped together with other positive emotions (and vice
versa), and if the clusters are relatively similar in size. As we have mentioned in our literature study
(Section 2.2.3.4), clusters generated with Ward’s method are indeed usually fairly equal in size, so the
assumption made here can be considered plausible.
The second cluster analysis, where the emotion categories with a frequency score lower than ten
were excluded, clearly shows that more discrepancies occurred between the TV series. First of all, the
number of final clusters differs. The eight original clusters were reduced to five final clusters for BZL
and only four for BG, while OV still retained a total of seven clusters. Examining the polarity of the
clusters, we can see that the fairly equal distribution from before has now shifted towards the negative
side, which corresponds to the unbalanced distribution in Ekman’s basic emotions set. Only one out of
five remaining clusters for BZL is positive, and for OV only two out of seven clusters are positive. The
final clusters for BG are the only exception, as the second dendrogram does not show a polarity shift
towards the negative side. In fact, the adapted clusters show a perfect balance of two positive and two
negative clusters.
This shift in polarity can be explained by the clusters that were removed, which is linked to the
frequency analysis. For BZL, the emotion categories that were indicated less than ten times were almost
all positive. Consequently, the two clusters of which all members completely disappeared (the orange
and yellow clusters in Figure 8a) were both positive, leaving only one positive cluster and so one positive
emotion label in the final label set. For OV, the only cluster that completely disappeared (the pink cluster
in Figure 10a) was a positive cluster. Even though it only consisted of one emotion category, this positive
cluster label was still removed from the final label set, increasing the imbalance between positive and
negative emotion labels even more. The other emotion categories that were indicated less than ten times
were all part of different clusters. So even after removing those categories, the final emotion label and
the polarity of the cluster were still represented by other members in that cluster. On the other hand, the
clusters that completely disappeared for the second cluster analysis of BG (the orange, green and purple
clusters in Figure 9b) were all negative, leaving only two negative emotions in the final label set. Then
again, the remaining members of two of the positive clusters merged into one cluster, decreasing the
total number of positive emotion labels from three to two, but causing the perfect balance between
positive and negative cases.
Additionally, there are some interesting remarks to be made about how the positive and negative clusters
are linked. If we take a look at both the initial and adapted dendrograms for BZL, we can see that there
is a clear separation between the positive and negative clusters. The initial dendrogram shows three
positive clusters grouped together on the left, and five negative clusters grouped together on the right.
The adapted dendrogram shows four negative clusters grouped together on the right, which are then
linked to the one positive cluster on the left. The same observation can be made with the adapted
36
dendrogram for OV: five negative clusters are grouped together on the right side of the dendrogram and
are then linked to the two positive clusters grouped together on the left side.
At first sight, it seems as if the adapted dendrogram for BG too shows a clear separation in terms
of polarity. However, when we take a closer look at the hierarchical structure, it becomes clear that
disregarding the highest linking line does not separate the two positive clusters from the two negative
clusters. First the two positive clusters on the right side are grouped, and then this positive group is
linked to a negative cluster. Lastly, the final negative cluster on the left is linked to this group of three
clusters with different polarities. While the polarities are not really mixed, this is still a significant
difference compared to some of the other dendrograms where the highest line is the only link between
positive and negative clusters.
When we look at the initial dendrogram for BG, though, we immediately notice that the initial
clustering does not result in two distinct groups of all positive clusters on one side and all negative
clusters on the other. The polarities are mixed: on the far right, we start with two positive clusters
grouped together. These are then linked to a negative cluster, and then this group is linked to another
group of two negative clusters. These five clusters (of which two are now positive and three negative)
are then linked with a positive cluster. Finally, this group of six clusters on the right is connected to a
group of two negative clusters on the left. A first hypothesis as to why in this case the polarities are
mixed, is that perhaps certain utterances from BG were labelled with both positive and negative emotion
categories. A similar observation can be made with the initial dendrogram for OV, as this dendrogram
too shows a positive cluster intruding in the negative cluster group.
The clusters for OV show almost the exact same polarity distribution, and the initial dendrogram
for OV would have already achieved the perfect separation between positive and negative clusters that
can be seen in the adapted dendrogram, had it not been for the positive lust cluster amidst the negative
cluster group. This positive cluster is first grouped with a negative cluster, and then four more negative
clusters are linked to the group. Finally, the highest linking line connects this group of almost all negative
clusters on the right side of the dendrogram to a group of two positive clusters on the left.
Furthermore, we want to analyse some odd behaviour of certain emotion categories. A first emotion that
is worth discussing is surprise. As mentioned before, surprise was considered a neutral emotion, as it
can be both positive or negative depending on the context. Based on the topics of the TV series, we
made the hypothesis that surprise would be positive for BG, but negative for BZL and OV. Yet, when
we take a closer look at the emotion categories that were clustered together with surprise, we can see
that they are all negative for each of the TV series. For BZL, surprise is clustered with disgust, pity and
sadness. The dendrograms for BG show that the surprise cluster further consists of fear and nervousness.
For OV, surprise was first linked with pity, and then in the adapted dendrogram it was grouped with
disappointment and remorse. Even the combined data shows all other emotions in the surprise cluster,
namely remorse, disgust and pity, to be negative. From this observation, we can derive that surprise
37
often occurred in utterances with negative emotions, which might implicate that it should be considered
a negative emotion as well. Our previous hypothesis on the polarity of surprise can therefore be rejected.
However, if we examine the annotations in more detail, there are some interesting results. We
have counted the number of times that each of the emotion categories was indicated with surprise, and
some of the observations actually oppose the cluster formation and instead support our initial hypothesis
on the polarity of surprise. As already mentioned, the surprise cluster for BZL further contained disgust,
pity and sadness, which corresponds to the annotations: those three emotions are indeed the labels with
which surprise was indicated the most for this topic (respectively 11, 9 and 11 times). For BG, though,
the emotion categories that were clustered together with surprise (fear and nervousness) were not the
most frequent emotions to be annotated with surprise. While fear was only indicated 3 times in
combination with surprise and nervousness only 8 times, contentment, enthusiasm and joy were each
indicated ten times or more. What is so striking about this observation, is that for BG the three most
indicated emotions in combination with surprise are all positive, meaning that surprise more frequently
occurred with positive emotions. This conflicts with what is shown in the emotion clusters for BG, as
all other members in the surprise cluster were negative, making surprise seem negative as well. If we
then take a closer look at the data for OV, we notice that the emotions that occurred the most with
surprise are frustration and nervousness, which were both indicated 6 times. Again, these emotion labels
do not match the members in the surprise clusters, as the initial dendrogram shows surprise in a cluster
with pity and the adapted dendrogram shows disappointment and remorse clustered together with
surprise. Nevertheless, seeing that frustration and nervousness are negative emotions just like pity,
disappointment and remorse, those alternative combinations would still result in surprise being
considered negative, and so for this topic our hypothesis on the polarity of surprise can be confirmed.
Overall, there was a significant difference between the number of occurrences with positive
emotions and negative emotions. The frequency scores of surprise being indicated in combination with
either positive or negative emotions is presented in Table 1 below. It becomes immediately clear that
surprise was more often indicated with negative emotions for both BZL and OV, while it was more
often indicated with positive emotions for BG. In relation to the polarity of surprise, this means that
surprise can be considered negative for two and positive for one of the topics. As opposed to the
clustering results, these observations from the annotation dataset correspond to our hypothesis on the
polarity of surprise.
Furthermore, these frequency scores also reveal another characteristic of surprise. In the entire
dataset, surprise was indicated a total of 58 times, but interestingly only occurred on its own twice. All
56 other times, it was indicated in combination with one or more emotion categories. In fact, almost all
emotion categories from our label set (22 out of 25 labels, or 24 if you leave out surprise) were indicated
at least once in combination with surprise. This implies that surprise is a versatile, ever-changing
emotion that can be paired with all sorts of emotions. On the other hand, it might also suggest that
38
surprise can often be considered as a trigger emotion followed by another more prominent ‘main’
emotion.
BZL BG OV
negative emotion 57 12 32
positive emotion 30 42 9
no other emotion 1 0 1
Table 1: Polarity combinations with surprise
Another interesting emotion to examine is longing, which is one of the ten positive emotions in our label
set. When examining the other cluster members, though, we notice that longing was occasionally
grouped together with negative emotions. The initial dendrogram for BZL shows longing in the same
cluster as remorse and envy, two negative emotions. In the initial dendrogram for BG, longing again
shares a cluster with envy. It is only after the infrequent emotion categories for BG are removed that
longing becomes part of a positive cluster with enthrallment and love. The initial dendrogram for the
combined data also shows longing clustered together with envy, but then again it is grouped with the
positive emotion lust as well. This repeated link between longing and envy might indicate that these two
emotion categories often occurred together. Interestingly, when we take a look at the frequency scores,
we see that envy was always removed for the adapted cluster analysis, while this only happened once
with longing. This suggests that longing was more often indicated for an utterance containing envy, and
not the other way around. In other words, envy probably triggered the feeling of longing too, while
longing did not always trigger envy and so could be labelled on its own. This connection between envy
and longing can initially be supported by simply thinking about situations in which these two emotions
would occur together: if one is envious of someone or something, they would most likely long for that
specific situation to happen to them (Utterance 1). But when an individual longs for someone or
something, there is not always a specific person that they envy; it could just be a general feeling
(Utterance 2). To clarify, we present some supporting examples from the annotations dataset:
Utterance 1: “Ik kom uit een euh een warm gezin. Mijn ouders zijn toch al heel lang getrouwd.
Tis een heel warm, goed koppel. Tzijn ook mensen die echt wel pro-huwelijk zijn. En zo wil ik
het ook wel voor mezelf.”
Utterance 2: “In mijn leven is alles eigenlijk compleet. Kheb een toffe job, supertoffe vrienden
euh, lieve familie. Alles is er euh om gelukkig te zijn buiten da euh da één stukje da denk ik toch
alles wel compleet kan maken. Tis volgens mij wel leuker me twee.”
39
Utterance 1 was labelled with both envy and longing, while utterance 2 was only labelled with longing.
The speaker of utterance 1 mentions their parents and how they want to have a loving marriage like
theirs, which is the reason why not only longing but also envy was indicated for this utterance. With
utterance 2, on the other hand, the speaker merely says what he would like to experience and does not
mention a specific person they are envious of. That is why only longing was indicated for this utterance.
In addition, we can refer back to the annotations and frequency analysis to further support our findings.
The frequency overview (Figure 3) tells us how many times envy and longing were labelled for each of
the TV series. After further examining the annotations, we have come to the conclusion that for BZL
envy and longing were always indicated on their own. Envy was never indicated for OV, and so longing
was always labelled separately. For BG though, these two emotions also occurred together: apart from
the 11 times that longing was indicated by itself, it was also indicated all four times that envy was
labelled, meaning that envy only occurred in combination with longing. However, we should mention
that the combination of these two emotions does not always occur together by default. Longing was
more frequently linked to other positive emotions such as love and optimism and was part of positive
emotion clusters too, which is shown in both dendrograms for OV and the general dataset, as well as the
adapted dendrogram for BG, which was already mentioned earlier.
One more striking observation is the continuous clustering of certain emotion categories. This occurred
for two groups of three emotions: the negative group of anger, frustration and irritation, as well as the
positive group of joy, contentment and enthusiasm.
For all three topical datasets and even the combined dataset, anger, frustration and irritation
were always clustered together. Utterance 3 and 4 below show examples from the annotations where all
three emotions were labelled at once. The only dendrogram where this cluster does not appear, is the
adapted dendrogram for BG, as those emotion categories were removed from the set for the second
cluster analysis. Both dendrograms for BZL show anger, frustration and irritation grouped together
with rejection and disappointment as well, but in all the other dendrograms, those three emotion
categories form a cluster on their own. After taking a closer look at the annotations, we counted 40
utterances where anger, frustration and irritation were indicated together. For 43 other utterances, one
of these three emotions was annotated in combination with another one of the three emotions.
Considering the fact that anger only appeared 57 times in the entire dataset, this tells us that anger
primarily occurred alongside other emotions.
This remarkable pattern either suggests that anger, frustration and irritation regularly occur
together, or that the difference in meaning between the three is rather small. In fact, the latter explanation
is supported by certain emotion frameworks, as some do not even view these labels as separate emotions.
Basic emotion frameworks in particular often capture frustration and irritation under the umbrella term
anger. Take for example Ekman’s Atlas of Emotions (atlasofemotions.org), which is an educational tool
about the five most agreed on universal emotions that form the foundation for many emotion frameworks
40
(see Section 2.1.2.3). This model considers frustration as one of the states of anger, not as a separate
emotion. Parrott (2001), and later in their study also Shaver et al. (1987), even classify frustration as a
tertiary and irritation as a secondary emotion of the primary emotion anger. In fact, many basic emotion
sets (Tomkins, 1984; Plutchik, 1980; Izard, 1971) are fairly limited and only include anger, whereas the
more extensive emotion frameworks (Shaver et al., 1987; Russell, 1980; Schröder, Pirker, & Lamolle,
2006) sometimes also include frustration and in some cases even irritation.
Utterance 3: “Gij komt hier altijd met van die stomme flauwekul, gij. Kheb da nie nodig.”
Utterance 4: “(baas roept hurry up) - Jaaaaaaah. Gohhhh.”
Besides this group of negative emotions, joy, contentment and enthusiasm were always clustered
together as well, this time without any exceptions. The clustering of these positive emotions can be seen
in each and every one of the dendrograms, though not always as a separate cluster. Both dendrograms
for BG show this group of three being joined by two other cluster members, namely optimism and pride.
In the initial dendrogram for OV, it is only pride who completes the joy cluster, but this emotion category
is then removed for the second clustering due to the frequency threshold of ten. We again examined the
annotations and counted a total of 55 instances where joy, enthusiasm and contentment were indicated
for the same utterance. This is about half of the number of times that each of these emotions occurred
(the total frequency score in the entire dataset is 109 for joy, 107 for contentment and 93 for enthusiasm).
In 55 other cases, two of these three emotions were annotated together. Utterance 5 and 6 show examples
of an utterance for which all three emotion categories were indicated.
Utterance 5: “Oh, top! Nee? Echt wel chill, eh?”
Utterance 6: “Eindelijk ga ik vannacht in een goe bed slapen. Amai. Die flutmatrasjes, kga ze
nie missen. Voila.”
A final aspect of the results that we want to examine, is the spread of Ekman’s basic emotions.
Remarkable here is that almost all dendrograms show a cluster that contains multiple Ekman emotions.
The only structures that show the incorporated Ekman emotions separated in different clusters, are both
the initial and the adapted dendrogram for OV. All other topical datasets and the combined dataset as
well generate one particular cluster with two or even three Ekman emotions. The cluster for BZL
contains surprise, disgust and sadness from Ekman’s set. For BG, surprise and fear are clustered
together. Lastly, the combined dataset shows surprise and disgust in the same cluster. Interestingly,
these clusters remained intact in the adapted dendrograms. What immediately stands out when
examining these labels, is that surprise is always included in the cluster with more than one Ekman
emotion. We think this can be explained by the variable nature of surprise. As we have already
mentioned, surprise can be both a positive or a negative emotion. Furthermore, surprise can be paired
41
with a variety of other emotions, as it does not often occur alone, but rather in combination with another
emotion category (see Table 1). Surprise can then be regarded as some sort of trigger emotion, followed
by another more prominent emotion. It has also come to our attention that the polarity of surprise then
depends on the polarity of the other emotions in that utterance: if for example surprise triggers the
positive emotion joy, then surprise itself will be considered a positive emotion as well. Utterance 7
presents an example where surprise triggered the emotions contentment, enthusiasm and joy. As these
are all positive emotions, for this utterance surprise would be considered a positive emotion as well.
Utterance 7: “Man, man, man. Waar ben ik aan begonnen? - Ik ga trouwen! - Oh my god! - Ho
jongen, da meende nie? Toch nog vant straat. Proficiat. Nu gaat beginnen.”
5.2 Comparison to previous research
To evaluate the outcome of our research, we would like to compare our results to those for another
domain, as well as to an already existing and frequently used emotion framework. De Bruyne et al.
(2019) have conducted a frequency and cluster analysis similar to ours, but instead of TV series, they
used Dutch tweets as data. Their study resulted in an empirically grounded framework intended for
domain-specific automatic emotion detection on tweets. To further examine the resemblance of our
results to already existing frameworks, our final label set will also be compared to Ekman’s well-known
basic emotions set.
We decided to compare the findings of our analysis to those of De Bruyne et al. (2019), which
revealed a number of similarities. Firstly, even though they adopted a different linking method and cut-
off value, their first cluster analysis (Figure 12a) resulted in a total of 8 clusters. This was also the case
for all four of our datasets. While this is a rather superficial observation that is probably a mere
coincidence, it remains a first noticeable similarity between both studies.
42
Figure 12a: Initial dendrogram for tweets (De Bruyne et al., 2019)
Figure 12b: Adapted dendrogram for tweets without infrequent emotions (De Bruyne et al., 2019)
When we take a closer look at the dendrograms for tweets, we see that some of our striking observations
presented in the section above recur in this study as well. One of the remarkable things all results have
in common, is the repeated clustering of two different groups: anger, frustration and irritation on the
one hand, and joy, contentment and enthusiasm on the other hand. As we have pointed out in our analysis
above, these two groups mostly occurred on their own, but sometimes also shared a cluster with other
43
emotion categories. In the dendrograms for tweets, however, only the latter can be observed: in both
dendrograms, the negative cluster with anger, frustration and irritation also contains disgust and
disappointment, and the positive cluster with joy, contentment and enthusiasm is completed by pride.
Interestingly, disappointment and pride belong to the selection of additional emotions that in some cases
were clustered together with one of the two groups in our research as well.
Furthermore, we have noticed that in both the initial and adapted dendrogram for tweets,
surprise is clustered with solely negative emotions, namely pity and sadness. As mentioned in the
previous section, the results for reality TV also showed all other members in the surprise cluster to be
negative. Some of those negative emotion categories were pity and sadness as well, among others.
Another situation that recurs is the incorporation of multiple Ekman emotions in one and the
same cluster. With our results, we saw this happening for three of the four datasets, and each time the
emotion category surprise was involved. This is not very different from what can be seen in the
dendrograms for tweets: the two dendrograms show both surprise and sadness in one cluster. However,
these are not the only Ekman emotions that are clustered together. One of the other clusters also shows
anger and disgust grouped together. Whereas our results showed a cluster that contained no less than
three Ekman emotions, there was never a dendrogram with more than one cluster containing multiple
Ekman emotions, let alone two dendrograms, so this observation is very particular.
A final similarity can be found in the frequency analyses of both studies. The most frequent
emotions for tweets actually show a great resemblance to the most frequent emotions for one of our
topical datasets, namely the one for BG. This observation again involves the emotion categories joy,
contentment and enthusiasm: In both frequency overviews, we can see that the two most frequent
emotions are contentment and joy. The third most frequent emotion in the tweets dataset is enthusiasm,
which happens to be the fourth most frequent emotion for BG.
In relation to emotion labels, our research on TV series transcriptions has produced a total of four label
sets: three are domain-specific in accordance with the specific TV series, while the fourth is rather
general. In theory, the general label set should resemble Ekman’s emotion set the most, seeing that both
frameworks are intended for all sorts of broad topics. Table 2 below gives an overview of the different
emotion frameworks for comparison. The emotion categories which are not part of Ekman’s set are
underlined.
44
Set Number of
emotions
Emotion labels
BZL (topic 1) 5 anger, fear, joy, sadness, suffering
BG (topic 2) 4 joy, love, sadness, surprise
OV (topic 3) 7 anger, disgust, fear, joy, optimism, sadness,
surprise
BZL+BG+OV (general) 6 anger, fear, joy, optimism, sadness, surprise
Basic emotions (Ekman, 1992) 6 anger, disgust, fear, joy, sadness, surprise
Tweets (De Bruyne et al.,
2019)
5 anger, joy, love, nervousness, sadness
Table 2: Emotion sets
With the total number of emotions ranging between four and six, and the majority of the emotion labels
being Ekman emotions, it is clear that all sets show a fair resemblance to one another. However, if we
compare our label sets to the two other frameworks, it appears that our sets are more similar to Ekman’s
basic emotions set than to the label set for tweets. This difference between the label sets for reality TV
and the one for tweets confirms that the content of both genres clearly differs, which implies that it might
be beneficial to modify the label set according to the topic or genre of the data.
Looking at the final labels for the topics from our research, we notice that there is always only one
emotion category which is not part of Ekman’s set. Interestingly, the label set for OV even incorporates
all of Ekman’s emotions, with the addition of optimism. Our general label set shows all Ekman emotions
but one, disgust, which has been replaced by optimism. With only one discrepancy each, these two label
sets show the greatest resemblance to Ekman’s set. And with only one difference between the two sets,
which is caused by whether or not disgust is incorporated, our general label set and the one for OV are
the most similar to one another. By contrast, the label set for BG deviates the most from Ekman’s set:
only half of his basic emotions remained, and another emotion category, love, was added.
When we examine the frequency of each of the Ekman emotions in Table 2, a first striking
observation is that joy and sadness appear in all of the label sets. These two most frequent emotions are
then followed by anger, which appears in five of the six label sets as it not incorporated in the set for
BG. Fear is also not incorporated in the set for BG, but neither in the label set for tweets, which brings
fear to a total of four appearances. This is the same frequency score as for surprise, as surprise does not
appear in the label set for BZL, nor in the set for tweets. The last Ekman emotion to examine, and the
one which appears the least in the table above, is disgust. Apart from Ekman’s set, disgust only appears
once, namely in our label set for OV.
When it comes to the non-Ekman emotions mentioned in the overview, we can see that love and
optimism are the most popular emotions with two appearances each. Love is part of our label set for BG,
45
but also appears in the framework for tweets. Remarkably, optimism even appears twice in our label sets
alone, in both the label set for OV and the general label set. The other non-Ekman emotion that appears
in one of our label sets is suffering, which is part of the set for BZL. On the other hand, the label set for
tweets also shows nervousness as a final label, but this emotion was not incorporated in any of our label
sets.
These observations, especially the similarity to Ekman’s set, suggest that Ekman’s emotions can
indeed be considered the most basic emotions that occur the most often and so some, if not all, should
therefore be incorporated in all label sets. Additionally, the non-Ekman emotions that complete the label
sets are found to give a more accurate indication of the specific data topics. For our research, this means
that those non-Ekman emotions summarise the main theme of the TV series by indicating the overall
emotional feeling. In short, we can infer that Ekman’s basic emotions framework can form a good
foundation for emotion label sets, but should be supplemented by other emotions that fit the content of
the data better.
5.3 Validity, reliability and added value
In this last section of the discussion, we want to reflect on the set-up of our study. It is important to
critically evaluate the process, as there are of course certain limitations to our study. First of all, the
annotation task was performed by one individual and was not reviewed by peers afterwards. Because of
this, it is possible that mistakes or odd annotations were left unseen. As the annotations are at the root
of this study and were used as input for the frequency and cluster analysis, it may have influenced the
outcome. However, we are confident that said mistakes were minimised. Clear guidelines based on IAA
studies4 were provided and the annotator in question was already familiarised with this type of work.
Secondly, emotion annotation remains quite a subjective matter. Even with guidelines
containing a clear explanation and several subcategories for each of the emotions, the labelling of
emotional content still relies on the intuition of the annotator. Results may differ depending on the
individual performing the task.
Thirdly, the selection of utterances from the transcriptions was not done arbitrarily. The 450
utterances for the subset were selected based on their emotional content, and utterances without
emotions were left out. This decision was made with the purpose of our study in mind, not to tamper
with the data or the results. Including objective utterances with no emotional content would simply be
of little to no use, as emotional content is exactly what we are trying to study. That is why the subset
should not be considered as a false representation of the topics, but rather a targeted selection focusing
on the emotional value.
4 The study on inter-annotator agreement is presented in section 3 of De Bruyne et al. (2019); see
https://www.lt3.ugent.be/publications/towards-an-empirically-grounded-framework-for-emot/ or
https://www.thinkmind.org/index.php?view=article&articleid=huso_2019_1_30_88038.
46
Lastly, when choosing the umbrella terms for the clusters, priority was given to the Ekman
emotions, even if the cluster contained a non-Ekman emotion category with a higher frequency. We are
aware that this might make the final label sets biased and directs them towards a greater resemblance
with Ekman’s emotion set. However, as our intention was to compare the label sets from this study to
already existing frameworks, we decided to select the emotions that were incorporated in a well-known
and frequently used label set. If we had chosen another emotion category for the umbrella term, the
Ekman emotions in a cluster would still be represented by that other emotion label. For this reason, we
did not see any problem in favouring the Ekman emotions for the umbrella terms, as it would only
facilitate the comparison process and make the differences more distinct.
Overall, we are of the opinion that we have presented a valid study. Considering the limitations of this
study, our dataset of TV series transcriptions was extensive enough to give a good representation of the
different topics, and the label set we selected was broad enough to first cover a considerable share of the
emotion spectrum and then be narrowed down to a more limited set. While emotion annotation remains
a subjective task, we tried to maximise the objectivity and consistency of the annotations by providing
the annotator with clear guidelines. Our decisions during the research process were not based on personal
preference, but were made with the intention to achieve the best possible outcome.
For future work, it might be interesting to conduct a similar study with a bigger dataset to get a more
accurate representation of the predominant emotions in reality TV. As our research process can easily
be replicated, we also encourage researchers to apply this approach to various other genres and topics
in order to establish frameworks which are modified for specific types of data. This type of research
certainly has an added value in the field of NLP, as it provides empirically grounded emotion
frameworks. When opting for a domain-specific label set, machines can be trained more adequately for
automatic emotion detection and annotation in the interest of the content and purpose of the data. Some
of the already existing frameworks namely find their origins in psychological research rather than
linguistics and are therefore not intended for specific language-related purposes.
47
6 CONCLUSION
Due to its many applications, emotion detection has become a popular research topic in the field of
natural language processing. While the majority of emotion frameworks have their origins in
psychology, many linguistic researchers still borrow those frameworks without any justification other
than the fact that they contain the most basic emotions. As there is no standard framework available, the
goal of this study was to provide a theoretically and empirically grounded framework for emotion
detection on Dutch data.
We used an extensive emotion framework consisting of 25 emotion categories to label 450 utterances
from Flemish reality TV transcriptions. This corpus of transcriptions incorporated utterances from three
TV series representing three different topics, with a selection of approximately 150 utterances per TV
series. Subsequently, we conducted a frequency and cluster analysis using those annotations, which
resulted in three topical label sets, as well as one general label set for this particular domain of reality
TV. This result immediately provides an affirmative answer to our first and main research question of
whether it is possible to deduce a label set from experimental cluster analysis, confirming our
hypothesis.
For the dataset, we selected three reality TV series with different topics so that we could not
only compare domains (reality TV versus tweets), but also different topics within the same domain. As
we had expected, the clusters and accordingly the final labels clearly differed depending on the topic:
The label set for the first topic (Bloed, Zweet en Luxeproblemen) consists of the five emotions joy, anger,
sadness, suffering and fear; the label set for the second topic (Blind Getrouwd) contains the four
emotions sadness, surprise, joy and love; finally the label set for the third topic (Ooit Vrij) comprises
the seven emotions joy, optimism, anger, sadness, disgust, fear and surprise. Moreover, the general
dataset, where all three topics were combined without making a distinction, also differs from each of
the topical label sets, as it consists of the six emotions joy, optimism, anger, fear, sadness and surprise.
In our discussion, we described the comparison of our label sets and clusters for reality TV to
those for tweets (De Bruyne et al., 2019). Although there were some similarities, such as the repeated
clustering of certain emotion categories, there was no clear match between the clusters for the two
domains. The label sets as well had some emotion categories in common, but still varied too much to
distinguish a pattern. This suggests that the domain is crucial when it comes to deciding which emotion
categories need to be included in the label set.
As our research revealed, it is very likely that the majority of label sets consists of basic emotions
such as those from Ekman’s model (1992). At the end of our discussion, we already argued that basic
emotions form a good basis for emotion frameworks, but should certainly be supported by other
emotions that are specifically adapted to the content of the data (depending on the domain and/or topic).
48
To briefly answer our last subquestion: basic emotions are indeed a good foundation for emotion
frameworks, but basic emotions alone do not suffice.
In conclusion, first conducting a cluster analysis is a good way to motivate your choice of emotion labels
for certain data, as emotion clusters differ depending on the topic and domain. As our approach was
data-driven, the final label sets for emotion detection are motivated both theoretically and empirically
rather than selected arbitrarily, and will perform significantly better in the specific context they are
intended for.
49
References
Airapetian, A. (2019). Emotion analysis in reality tv: A comparison between emotion annotations for
text and image. (Bachelor’s thesis). Ghent University, Belgium.
Bakliwal, A., Arora, P., & Varma, V. K. (2012). Entity centric opinion mining from blogs. In
Proceedings of the 2nd Workshop on Sentiment Analysis where AI meets Psychology, 53–64.
Retrieved from https://semanticscholar.org
Bakliwal, A., Arora, P., Madhappan, S., Kapre, N., Singh, M., & Varma, V. K. (2012). Mining
sentiments from tweets. In Proceedings of the 3rd Workshop in Computational Approaches to
Subjectivity and Sentiment Analysis, 11-18. Retrieved from https://semanticscholar.org
Blashfield, R. K., & Aldenderfer, M. S. (1988). The methods and problems of cluster analysis. In
J. R. Nesselroade, & R. B. Cattell (Eds.), Perspectives on individual differences. Handbook
of multivariate experimental psychology, 447–473. Plenum Press. doi:10.1007/978-1-4613-
0893-5_14
Buechel, S., & Hahn, U. (2016). Emotion analysis as a regression problem — Dimensional models
and their implications on emotion representation and metrical evaluation. In ECAI 2016: 22nd
European Conference on Artificial Intelligence, 1114-1122. doi.10.3233/978-1-61499-672-9-
1114
Darwin, C. R. (1872). The expression of the emotions in man and animals. London: John Murray.
Retrieved from http://darwin-online.org.uk
Daumé III, H., & Marcu, D. (2006). Domain adaptation for statistical classifiers. Journal of Artificial
Intelligence Research, 26, 101–126. Retrieved from https://semanticscholar.org
De Bruyne, L., De Clercq, O., & Hoste, V. (2019). Towards an empirically grounded framework for
emotion analysis. In Proceedings of HUSO 2019, The fifth international conference on human
and social analytics, 11–16. Presented at the HUSO 2019: The Fifth International Conference
on Human and Social Analytics, IARIA, International Academy, Research, and Industry
Association. Retrieved from https://biblio.ugent.be/publication/8624200
Dice, L. (1945). Measures of the amount of ecologic association between species. Ecology, 26(3),
297-302. doi:10.2307/1932409
Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3/4), 169-200.
doi:10.1080/02699939208411068
Ekman, P. (1997). Emotion families. In Semiotics around the World: Synthesis in Diversity, 191-193.
Berlin: Mouton de Gruyter. Retrieved from https://www.paulekman.com
Ekman, P. (1999). Basic emotions. In Handbook of Cognition and Emotion, 45-60. New York: John
Wiley & Sons Ltd. doi:10.1002/0470013494.ch3
Ekman, P., & Friesen, W. V. (1971). Constants across cultures in the face and emotion. Journal of
Personality and Social Psychology, 17(2), 124-129. doi:10.1037/h0030377
50
Ekman, P., Friesen, W. V., & Ellsworth, P. (1972). Emotion in the human face: Guidelines for
research and an integration of findings. New York: Pergamon Press. doi:10.1016/C2013-0-
02458-9
Fang, X., & Zhan, J. Z. (2015). Sentiment analysis using product review data. Journal of Big Data,
2(5), 1-14. doi:10.1186/s40537-015-0015-2
Fisher, L., & Van Ness, J. (1971). Admissible Clustering Procedures. Biometrika, 58(1), 91-104.
doi:10.2307/2334320
Frijda, N. H. (1988). The laws of emotion. American Psychologist, 43(5), 349-358.
doi:10.1037/0003-066X.43.5.349
Ghent University (2020). EmotioNL: Emotion detection for Dutch. Retrieved from
https://research.flw.ugent.be/en/projects/emotionl-emotion-detection-dutch
Glorot, X., Bordes, A., & Bengio, Y. (2011). Domain adaptation for large-scale sentiment
classification: a deep learning approach. In Proceedings of the 28th International Conference
on International Conference on Machine Learning (ICML’11), 513–520. Retrieved from
https://semanticscholar.org
Godbole, N., Srinivasaiah, M., & Skiena, S. (2007). Large-scale sentiment analysis for news and
blogs. In Proceedings of the International Conference on Weblogs and Social Media
(ICWSM’2007). Retrieved from https://semanticscholar.org
Houbregs, J. (Writer), & Belien, S. (Director). (2018). Bloed, Zweet en Luxeproblemen [Television
series]. In J. Houbregs (Executive Producer). Zaventem: Warner Bros. ITVP België.
Izard, C. E. (1971). The face of emotion. Appleton-Century-Crofts.
Izard, C. E. (1991). The psychology of emotions. New York: Springer. doi:10.1007/978-1-4899-0615-1
Lazar, C. (2012). Cluster analysis [PowerPoint slides]. Retrieved from
https://ai.vub.ac.be/sites/default/files/lecturemaster2011.pdf
Mohammad, S. M. (2016). Sentiment analysis: Detecting valence, emotions, and other affectual states
from text. In Emotion Measurement, 201-237. doi:10.1016/b978-0-08-100508-8.00009-6
Parrott, W. G. (2001). Emotions in social psychology: Essential readings. Philadelphia: Psychology
Press. Retrieved from http://books.google.be/books
Paul Ekman International. (2018, April 24). A brief look into Dr. Paul Ekman's early research.
Retrieved May 2019, from https://www.ekmaninternational.com/a-brief-history-into-paul-
ekmans-early-research/
Plutchik, R. (1962). The emotions: Facts, theories and a new model. New York: Random House.
Retrieved from https://archive.org
Plutchik, R. (1980). A general psychoevolutionary theory of emotion. Emotion: Theory, Research, and
Experience, 1(3), 3-33. doi:10.1016/B978-0-12-558701-3.50007-7
Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology,
39(6), 1161–1178. doi:10.1037/h0077714
51
Russell, J., & Mehrabian, A. (1977). Evidence for a three-factor theory of emotions. Journal of
Research in Personality, (11), 273-294. doi:10.1016/0092-6566(77)90037-X.
Schröder, M., Pirker, H., Lamolle, M. (2006). First suggestions for an emotion annotation and
representation language. In Proceedings of LREC, 88-92. Retrieved from
https://www.academia.edu/
Shaver, P., Schwartz, J., Kirson, D., & O'Connor, C. (1987). Emotion knowledge: Further exploration
of a prototype approach. Journal of Personality and Social Psychology, 52(6), 1061–
1086. doi:10.1037/0022-3514.52.6.1061
Sneath, P. H. A., & Sokal, R. R. (1973). Numerical Taxonomy: The Principles and Practice of
Numerical Classification. San Francisco: Freeman.
The Ekmans’ Atlas of Emotions. (n.d.) Retrieved May 2020, from http://atlasofemotions.org/
Thet, T. T., Na, J., & Khoo, C. S. (2010). Aspect-based sentiment analysis of movie reviews on
discussion boards. Journal of Information Science, 36(6), 823-848.
doi:10.1177/0165551510388123
Tomkins, S. S. (1962). Affect, imagery, consciousness. Volume I: The positive affects. New York:
Springer. Retrieved from http://books.google.be/books
Tomkins, S. S. (1984). Affect theory. In K. R. Scherer, & P. Ekman (Eds.), Approaches to Emotion,
163-195. Lawrence Erlbaum Associates.
Uytterhoeven, T. (Director). (2019). Ooit Vrij [Television series]. In I. Colpaert (Producer).
Vilvoorde: Woestijnvis.
Van Hoecke, E. (Director). (2019). Blind Getrouwd [Television series]. In M. Miller (Producer), L.
Lombaert (Executive Producer). Vilvoorde: Productie PIT, Antwerpen: DPG Media.
Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the
American Statistical Association, 58(301), 236-244. doi:10.2307/2282967
Wood, I.D., McCrae, J. P., Andryushechkin, V., & Buitelaar, P. (2018). A comparison of emotion
annotation approaches for text. Information, 9(5), 117. doi:10.3390/info9050117
52
Appendices
Appendix 1 (electronic): Annotations