deep learning for natural language sentiment and...

Deep Learning for Natural Language

Sentiment and Affect

Muhammad Abdul-Mageed

The University of British Columbia

[email protected]

(Abdul-Mageed & Kralj Novak, 2018)

Petra Kralj Novak

Jožef Stefan Institute

[email protected]

mailto:[email protected]


Outline

• Introduction

• Classical Methods

• Deep Learning Methods – on separate slides

• Multilingual Approaches

• Resources

• Ethics

2

Introduction

3

How far can we go with machines?

4

Information Overload

5https://beta.techcrunch.com/2017/06/27/facebook-2-billion-users/

https://marketinsight.gkfx.com/

Financial Markets

6

https://marketinsight.gkfx.com/

Sentiment analysis (broad definition)

• Sentiment analysis and opinion mining is the field of study that analyzes people’s

• opinions,

• sentiments,

• evaluations,

• attitudes, and

• emotions

from written language.

7

Sentiment analysis (narrow) definition

• Sentiment Analysis is the process of computationally determining whether a piece of text is positive, neutral or negative.

• Sentiment polarity & subjectivity

8

Sentiment analysis (narrow) definition

• Sentiment Analysis is the process of computationally determining whether a piece of text is positive, neutral or negative.

• Sentiment polarity & subjectivity

9

Examples

• Opposite orientations in different application domains• “This camera sucks.” • “This vacuum cleaner really sucks.”

• Sarcasm:• “What a great car! It stopped working in two days.”

• Opinions without sentiment words• “This washer uses a lot of water.”

• Ambiguous• “It is my birthday today.”

• Language specific• “Na ECML komferenčni večerji smo se zabavali ob čudoviti glasbi in plesu.”

10

Granularity level

• Word• Sentence, paragraph• Document

11

Classical methods

12

Sentiment lexicons

• Good, wonderful, amazing

• Bad, poor, terrible• Cost someone an arm and a leg

13

Lexical sentiment analysis Loughran and McDonald Sentiment Word Lists

14

Lexical sentiment analysis of mainstream news: Bitcoin

15http://newstream.ijs.si/

http://newstream.ijs.si/

Lexical vs. machine learning methods

Lexical Machine learning

Maite Taboada, Sentiment Analysis: An Overview from Linguistics. Annual Review of Linguistics 2016 2:1, 325-347 16

Stance analysis

• Stance detection is the task of automatically determining whether the author of the text is in favor of, neutral or against towards a target

• Example:• Target: legalization of abortion

• Tweet: ”A fetus has rights too! Make your voice heard.”

17

Slovenian presidential elections 2012

• Stance analysis on manually annotated Twitter data: • Tweets annotated if it is in favor of, neutral or against each of the

candidates

• Linear kernel SVM model

18

Affect: Emotion is Pervasive

[Credit: www.uwtsd.ac.uk]

19

http://www.uwtsd.ac.uk/

[Credit: https://www.youtube.com/watch?v=Ixkp0T3-1YE]

Emotion in Public Discourse

20

https://www.youtube.com/watch?v=Ixkp0T3-1YE

Source: https://www.theatlantic.com/health/archive/2015/02/hard-feelings-sciences-struggle-to-define-emotions/385711

Hard Feelings: Science’s Struggle to

Define Emotions

21

https://www.theatlantic.com/health/archive/2015/02/hard-feelings-sciences-struggle-to-define-emotions/385711

What is emotion?

• “[E]veryone knows what an emotion is, until asked to give a definition. Then, it seems, none knows” (Fehr & Russel, 1984)

• Definitions vary as a function of:• discipline or approach

• time or culture

• ~ 100 definitions of emotion (Kleinginna & Kleinginna, 1984)

22

Models of emotion

• Categorical models of basic emotion

(e.g., Matsumoto & Ekman, 2009; Panksepp, 2005)

• Bidimensional models

(e.g., Russel, 2009)

• Appraisal models

(e.g., Arnold, 1950; 1960; Lazarus, 1991; Scherer et al., 2001)

• Other…

23

Basic emotion models

• Categorical models (e.g., Matsumoto & Ekman, 2009; Panksepp, 2005)

anger, disgust, fear,

joy, sadness, surprise

24

Bidimensional Models arousal

valence

aroused

sleepy

pleasedfrustrated

25

Bidimensional Models arousal

valence

26

Plutchik Wheel of Emotions

27

3 Circles of Arousal

Core, Primary, and

Secondary (p1, p2, p3)

8 dimensions

28

Arousal

29

2 Dimensions of Valence

30

Learning emotion

• Multiclass classification task

• Similar to learning sentiment (text classification)

31

The sentiment analysis pipeline

32


Millions of documents

Thousands of documents classifier

1 2 3

5Millions of documents

4

33

Data acquisition and labeling

• Acquisition: Relevant data

• Annotation: • Representative sample

• Sample size: 20 – 100K

• Duplicates

• Annotators• Clear instructions with examples

• Annotator self-agreement

• Inter-annotator agreement

Zollo, F., Novak, P.K., Del Vicario, M., Bessi, A., Mozetič, I., Scala, A., Caldarelli, G. and Quattrociocchi, W., 2015. Emotional dynamics in the age of misinformation. PloS one, 10(9), p.e0138740. 34

Size of training dataset: saturation pointMonitor classifier performance while feeding increasingly larger training sets

Inter-annotator agreement Classifier performance

Saturation point not reached at 90,000 tweets Saturation point at 70,000 tweets

Mozetič, I., Grčar, M. and Smailović, J., 2016. Multilingual Twitter sentiment classification: The role of human annotators. PloS one, 11(5), p.e0155036. 35

The role of human sentiment annotators

Comparison of annotators self-agreement, the inter-annotator agreement, and an automated sentiment classifier in terms of Krippendorff’s Alpha.

Mozetič, I., Grčar, M. and Smailović, J., 2016. Multilingual Twitter sentiment classification: The role of human annotators. PloS one, 11(5), p.e0155036. 36

Distant supervision

To build a dataset

• Emoticon/emoji

• #tags

• Seed words (good, bad)

Remove the hints while training

37




1 2 3


4

38

2, 3 or more class problem?

• 2-class problem• Whether a review posted online (of a movie, a book, or a consumer product)

is positive or negative towards the item being reviewed

• 3-class problem• Whether the sentiment of the text is positive, neutral or negative

• More-class problem• Emotion detection

39

Exercise: confusion matrix of a classifier

40


• Accuracy = 80% in both cases

• The errors in the first matrix are heavier then in the second

41

Problem formulation: Ordinal regression

• Three class problem: negative, neutral, positive

• Error from positive to negative is bigger then the error from positive to neutral

42

Problem formulation: Ordinal regression

• Three class problem: negative, neutral, positive

• Error from positive to negative is bigger then the error from positive to neutral

• Measures of quality:• Accuracy, Accuracy@1

• f1

• MAE, MSE

• Choen’s Kappa

• Krippendorff’s Alpha

43


• Accuracy = 80%

• F1 = 0.71

• Accuracy = 80%

• F1 = 0.83

44




1 2 3


4

45

Classifier

Traditional approaches: SVM, Naïve Bayes

Neural networks

46

Data representation 1: BOW

• Each word is one dimension

• Each document is one point on a hypersphere

47

Social media specific sentiment features

48

Data representation: Additional features

• BOW bag of words + additional features• Word N-grams: (Justin Bieber, video games, not happy)

• Punctuation:

• Emoticons and emoji:

• Preprocessing: baaaaaaad → baaad

• Capitalization: SCREAMING

• Language specific • Lists of positive and negative words: SentiWordNet

• Spellings of swearing: f**k

• Language (keyboard) specific emoticons: ಠ_ಠ , ƸӜƷ

49

Precision-recall tuning

• Precision & Recall should be similar for both the positive and the negative class

50

Deep learning methods

51

Multilingual sentiment analysis• Lo, S.L., Cambria, E., Chiong, R. and Cornforth, D., 2017. Multilingual sentiment analysis: from formal

to informal and scarce resource languages. Artificial Intelligence Review, 48(4), pp.499-527.

• Korayem, M., Aljadda, K. and Crandall, D., 2016. Sentiment/subjectivity analysis survey for languages other than English. Social network analysis and mining, 6(1), p.75.

52

NLP != English LP

53Image from https://fledu.uz

Languages in the world Languages on Twitter

https://fledu.uz/

Multilingual sentiment analysis approaches

A. Translation-based sentiment AnalysisB. Corpus basedC. Lexicon-based sentiment analysis D. Machine learning approachesE. Language independent approaches

54

Translation based sentiment analysis (2)

Original documentEnglish document Sentiment classification

Machine

translation

Apply English

sentiment

analysis

55

Translation based sentiment analysis (2)

Sentiment labeled corpus

(English)

Machine translate to

target language

Corpus in target language

Build a ML model

Original document

Sentiment model for

target language

Sentiment classification56

Corpus based

Parallel corpora

Apply “English”

sentiment model

Transfer

labels

Build sentiment

model for target

language

Sentiment model for

target language

57

Lexicon-based sentiment analysis

• Build a sentiment lexicon for target language• Translation of lexica (+ check 10.000 most frequent words)

• Word net (words and semantic relations) + seed words

58

Machine learning approaches

1. Labeled dataset• Manual annotation

• Distant supervision• Emoji/emoticon

• Positive and negative #tags

• Seed words

2. Build a machine learning model

59

Languages of rich morphology

60

(Abdul-Mageed, 2018)

Arabic

61

(Abdul-Mageed, 2018)62


Segmentation

65


POS Tagging

66


ASMA: Segmentation &

Morphosyntactic Disambiguation

ASMA: A Real-World Example

67


Modeling in lexical space

Modeling in morphosyntactic space

(Abdul-Mageed, 2015. Dissertation)68

Resources & Venues

70

Sentiment Resources

• Lexicons

• Models & libraries

• Annotated sentiment data

71

Lexicons• AFINN

• Bing Liu's Opinion Lexicon

• MPQA Subjectivity Lexicon

• Harvard General Inquirer

• SentiWordNet

• Loughran-McDonald Sentiment Word Lists

• Sentiment Lexicons for 81 Languages

• Emoji sentiment ranking

• Emoticon Sentiment Lexicon

• Sifat (Arabic adjectives)

72

http://sentiment.christopherpotts.net/lexicons.html#opinionlexicon


http://sentiment.christopherpotts.net/lexicons.html#mpqa

http://sentiment.christopherpotts.net/lexicons.html#inquirer

http://sentiment.christopherpotts.net/lexicons.html#sentiwordnet

https://sraf.nd.edu/textual-analysis/

https://www.kaggle.com/rtatman/sentiment-lexicons-for-81-languages

http://kt.ijs.si/data/Emoji_sentiment_ranking/

http://people.few.eur.nl/hogenboom/files/EmoticonSentimentLexicon.zip

AFINN

• A list of English words rated for valence

• Scale [-5,5]

• 2477 words and phrases

• Licence: Open Database License (ODbL) v1.0

• An evaluation of the word list is available in:Finn Årup Nielsen"A new ANEW: Evaluation of a word list for sentiment analysis in microblogs",Proceedings of the ESWC2011 Workshop on 'Making Sense of Microposts':Big things come in small packages 718 in CEUR Workshop Proceedings : 93-98. 2011 May.http://arxiv.org/abs/1103.2903

73


Emoji sentiment ranking

• Sentiment of 751 (most common) emojis

• Constructed from manually sentiment labeled 75,000 tweets with emoji in 13 European languages

• Similar format to SentiWordNet

• Kralj Novak P, Smailović J, Sluban B, Mozetič I (2015) Sentiment of Emojis. PLoS ONE 10(12): e0144296. https://doi.org/10.1371/journal.pone.0144296

• http://kt.ijs.si/data/Emoji_sentiment_ranking/

74

http://kt.ijs.si/data/Emoji_sentiment_ranking/

https://doi.org/10.1371/journal.pone.0144296

Models

• TextBlob• PatternAnalyzer: based on a lexicon of adjectives

• NaiveBayesAnalyzer: a NLTK classifier trained on a movie reviews corpus

• (Python) https://textblob.readthedocs.io/en/dev/

• Ipubila sentiment analysis • English, German, French and Italian.

• (Python, REST) https://github.com/ipublia/sentiment-analysis

75

Annotated sentiment data

• Twitter sentiment for 15 European languages (1,643,735 manually annotated tweets)

• SemEval competition data• Bing Liu’s customer reviews and other datasets• Product reviews: this dataset consists of a few million Amazon customer reviews with

star ratings, super useful for training a sentiment analysis model.• Restaurant reviews: this dataset consists of 5,2 million Yelp reviews with star ratings.• Movie reviews: this dataset consists of 1,000 positive and 1,000 negative processed

reviews. It also provides 5,331 positive and 5,331 negative processed sentences / snippets.

• Fine food reviews: this dataset consists of ~500,000 food reviews from Amazon. It includes product and user information, ratings, and a plain text version of every review.

• Twitter airline sentiment on Kaggle: this dataset consists of ~15,000 labeled tweets (positive, neutral, and negative) about airlines.

• First GOP Debate Twitter Sentiment: this dataset consists of ~14,000 labeled tweets (positive, neutral, and negative) about the first GOP debate in 2016.

76

http://hdl.handle.net/11356/1054

http://alt.qcri.org/semeval2018/index.php?id=tasks

https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html#datasets

https://www.kaggle.com/bittlingmayer/amazonreviews

https://www.kaggle.com/yelp-dataset/yelp-dataset

http://www.cs.cornell.edu/people/pabo/movie-review-data/

https://www.kaggle.com/snap/amazon-fine-food-reviews

https://www.kaggle.com/crowdflower/twitter-airline-sentiment

https://www.kaggle.com/crowdflower/first-gop-debate-twitter-sentiment

Emotion Resources

• Lexicons• NRC emotion lexicon• UBC emotion lexicon (ongoing work)

• Data

• SemEval 2007; 2018; 2019

• Aman and Szpakowicz (2007)

• Abdul-Mageed and Ungar (2017)

• Alhuzali, Abdul-Mageed, and Ungar (2018) (Arabic)

77

https://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm

Biases & Ethics

78

Biases: Social media data is not representative

• Demographic differences between social media users and “target population”

• Behaviour biases

• Linking biases

• Temporal variations

79

Ethics

• Types of social media research

• Users publishing content might have not anticipated a particular use

Aware Not aware

Manipulated Lab studies A/B testing

Not manipulated Opt-in study Observational studiesSentiment

analysis

80

Ethics

• Private or public?• PRIVATE: a password protected ‘private’ Facebook group

• PUBLIC: an open discussion on Twitter in which people broadcast their opinions using a #tag (in order to associate their thoughts on a subject with others’ thoughts on the same subject)

• Public != Non-sensitive

Townsend, L. and Wallace, C., 2016. Social media research: A guide to ethics. University of

Aberdeen, pp.1-16.

81

Ethics: Case Study - Marihuana

• Twitter: #cannabis, #legalize, #ismokeit

• Concerns: • Sensitive: illegal activity

• May be users under the age of 18

• Solution:• Present results from aggregate data,

• Avoid compromising anonymity: paraphrased quotes (removing ID handles)

• Direct quotes may be used with informed consent from the platform (over 18) user.

82

Take-home messages

• On real data, human annotators disagree → hard problem

• The best classifier can not outperform the inter-annotator agreement

• Data representation• BOW + Social media specific features: punctuation, emojis, …

• Embedding + deep learning: need lots of data (unlabeled, distant supervision)

• NLP != English LP

83

ReferencesINCOMPLETE• Liu B. Sentiment analysis: mining opinions, sentiments, and emotions. The Cambridge University Press, 2015.

• Zhang, L., Wang, S., & Liu, B. (2018). Deep Learning for Sentiment Analysis: A Survey. arXiv preprint arXiv:1801.07883.

• Mohammad, S. M. Challenges in sentiment analysis. In A Practical Guide to Sentiment Analysis (pp. 61-83). Springer, Cham, 2017.

• Taboada, M. Sentiment Analysis: An Overview from Linguistics. Annual Review of Linguistics 2016 2:1, 325-347

• Abdul-Mageed, M. and Ungar, L., 2017. Emonet: Fine-grained emotion detection with gated recurrent neural networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 718-728).

• Mozetič, I., Grčar, M. and Smailović, J., 2016. Multilingual Twitter sentiment classification: The role of human annotators. PloS one, 11(5), p.e0155036.

• Zollo, F., Novak, P.K., Del Vicario, M., Bessi, A., Mozetič, I., Scala, A., Caldarelli, G. and Quattrociocchi, W., 2015. Emotional dynamics in the age of misinformation. PloS one, 10(9), p.e0138740.

• Zollo, F., Sluban, B., Mozetič, I. and Quattrociocchi, W., 2017, November. Toward a Better Understanding of Emotional Dynamics on Facebook. In International Workshop on Complex Networks and their Applications (pp. 365-377). Springer, Cham.

• Kralj Novak, P. , Smailović, J., Sluban, B., & Mozetič, I. (2015). Sentiment of emojis. PloS one, 10(12), e0144296.

Multilingual:

• Lo, S.L., Cambria, E., Chiong, R. and Cornforth, D., 2017. Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artificial Intelligence Review, 48(4), pp.499-527.

• Korayem, M., Aljadda, K. and Crandall, D., 2016. Sentiment/subjectivity analysis survey for languages other than English. Social network analysis and mining, 6(1), p.75.

• Abdul-Mageed, M., Diab, M. and Kübler, S., 2014. SAMAR: Subjectivity and sentiment analysis for Arabic social media. Computer Speech & Language, 28(1), pp.20-37.

Ethics:

• Townsend, L. and Wallace, C., 2016. Social media research: A guide to ethics. University of Aberdeen, pp.1-16.

84

Muhammad Abdul-Mageed

Natural Language Processing Lab

School of Information

The University of British Columbia

Vancouver, Canada

[email protected]

(Abdul-Mageed & Kralj Novak, 2018)

Petra Kralj Novak

Department of Knowledge Technologies

Jožef Stefan Institute

Ljubljana, Slovenia

[email protected]

@PetraKraljNovak



deep learning for natural language sentiment and...

Documents