deep learning for natural language sentiment and...
TRANSCRIPT
Deep Learning for Natural Language
Sentiment and Affect
Muhammad Abdul-Mageed
The University of British Columbia
(Abdul-Mageed & Kralj Novak, 2018)
Petra Kralj Novak
Jožef Stefan Institute
Outline
• Introduction
• Classical Methods
• Deep Learning Methods – on separate slides
• Multilingual Approaches
• Resources
• Ethics
2
Introduction
3
How far can we go with machines?
4
Information Overload
5https://beta.techcrunch.com/2017/06/27/facebook-2-billion-users/
Sentiment analysis (broad definition)
• Sentiment analysis and opinion mining is the field of study that analyzes people’s
• opinions,
• sentiments,
• evaluations,
• attitudes, and
• emotions
from written language.
7
Sentiment analysis (narrow) definition
• Sentiment Analysis is the process of computationally determining whether a piece of text is positive, neutral or negative.
• Sentiment polarity & subjectivity
8
Sentiment analysis (narrow) definition
• Sentiment Analysis is the process of computationally determining whether a piece of text is positive, neutral or negative.
• Sentiment polarity & subjectivity
9
Examples
• Opposite orientations in different application domains• “This camera sucks.” • “This vacuum cleaner really sucks.”
• Sarcasm:• “What a great car! It stopped working in two days.”
• Opinions without sentiment words• “This washer uses a lot of water.”
• Ambiguous• “It is my birthday today.”
• Language specific• “Na ECML komferenčni večerji smo se zabavali ob čudoviti glasbi in plesu.”
10
Granularity level
• Word• Sentence, paragraph• Document
11
Classical methods
12
Sentiment lexicons
• Good, wonderful, amazing
• Bad, poor, terrible• Cost someone an arm and a leg
13
Lexical sentiment analysis Loughran and McDonald Sentiment Word Lists
14
Lexical sentiment analysis of mainstream news: Bitcoin
15http://newstream.ijs.si/
Lexical vs. machine learning methods
Lexical Machine learning
Maite Taboada, Sentiment Analysis: An Overview from Linguistics. Annual Review of Linguistics 2016 2:1, 325-347 16
Stance analysis
• Stance detection is the task of automatically determining whether the author of the text is in favor of, neutral or against towards a target
• Example:• Target: legalization of abortion
• Tweet: ”A fetus has rights too! Make your voice heard.”
17
Slovenian presidential elections 2012
• Stance analysis on manually annotated Twitter data: • Tweets annotated if it is in favor of, neutral or against each of the
candidates
• Linear kernel SVM model
18
[Credit: https://www.youtube.com/watch?v=Ixkp0T3-1YE]
Emotion in Public Discourse
20
Source: https://www.theatlantic.com/health/archive/2015/02/hard-feelings-sciences-struggle-to-define-emotions/385711
Hard Feelings: Science’s Struggle to
Define Emotions
21
What is emotion?
• “[E]veryone knows what an emotion is, until asked to give a definition. Then, it seems, none knows” (Fehr & Russel, 1984)
• Definitions vary as a function of:• discipline or approach
• time or culture
• ~ 100 definitions of emotion (Kleinginna & Kleinginna, 1984)
22
Models of emotion
• Categorical models of basic emotion
(e.g., Matsumoto & Ekman, 2009; Panksepp, 2005)
• Bidimensional models
(e.g., Russel, 2009)
• Appraisal models
(e.g., Arnold, 1950; 1960; Lazarus, 1991; Scherer et al., 2001)
• Other…
23
Basic emotion models
• Categorical models (e.g., Matsumoto & Ekman, 2009; Panksepp, 2005)
anger, disgust, fear,
joy, sadness, surprise
24
Bidimensional Models arousal
valence
aroused
sleepy
pleasedfrustrated
25
Bidimensional Models arousal
valence
26
Plutchik Wheel of Emotions
27
3 Circles of Arousal
Core, Primary, and
Secondary (p1, p2, p3)
8 dimensions
28
Arousal
29
2 Dimensions of Valence
30
Learning emotion
• Multiclass classification task
• Similar to learning sentiment (text classification)
31
The sentiment analysis pipeline
32
The sentiment analysis pipeline
Millions of documents
Thousands of documents classifier
1 2 3
5Millions of documents
4
33
Data acquisition and labeling
• Acquisition: Relevant data
• Annotation: • Representative sample
• Sample size: 20 – 100K
• Duplicates
• Annotators• Clear instructions with examples
• Annotator self-agreement
• Inter-annotator agreement
Zollo, F., Novak, P.K., Del Vicario, M., Bessi, A., Mozetič, I., Scala, A., Caldarelli, G. and Quattrociocchi, W., 2015. Emotional dynamics in the age of misinformation. PloS one, 10(9), p.e0138740. 34
Size of training dataset: saturation pointMonitor classifier performance while feeding increasingly larger training sets
Inter-annotator agreement Classifier performance
Saturation point not reached at 90,000 tweets Saturation point at 70,000 tweets
Mozetič, I., Grčar, M. and Smailović, J., 2016. Multilingual Twitter sentiment classification: The role of human annotators. PloS one, 11(5), p.e0155036. 35
The role of human sentiment annotators
Comparison of annotators self-agreement, the inter-annotator agreement, and an automated sentiment classifier in terms of Krippendorff’s Alpha.
Mozetič, I., Grčar, M. and Smailović, J., 2016. Multilingual Twitter sentiment classification: The role of human annotators. PloS one, 11(5), p.e0155036. 36
Distant supervision
To build a dataset
• Emoticon/emoji
• #tags
• Seed words (good, bad)
Remove the hints while training
37
The sentiment analysis pipeline
Millions of documents
Thousands of documents classifier
1 2 3
5Millions of documents
4
38
2, 3 or more class problem?
• 2-class problem• Whether a review posted online (of a movie, a book, or a consumer product)
is positive or negative towards the item being reviewed
• 3-class problem• Whether the sentiment of the text is positive, neutral or negative
• More-class problem• Emotion detection
39
Exercise: confusion matrix of a classifier
40
Exercise: confusion matrix of a classifier
• Accuracy = 80% in both cases
• The errors in the first matrix are heavier then in the second
41
Problem formulation: Ordinal regression
• Three class problem: negative, neutral, positive
• Error from positive to negative is bigger then the error from positive to neutral
42
Problem formulation: Ordinal regression
• Three class problem: negative, neutral, positive
• Error from positive to negative is bigger then the error from positive to neutral
• Measures of quality:• Accuracy, Accuracy@1
• f1
• MAE, MSE
• Choen’s Kappa
• Krippendorff’s Alpha
43
Exercise: confusion matrix of a classifier
• Accuracy = 80%
• F1 = 0.71
• Accuracy = 80%
• F1 = 0.83
44
The sentiment analysis pipeline
Millions of documents
Thousands of documents classifier
1 2 3
5Millions of documents
4
45
Classifier
Traditional approaches: SVM, Naïve Bayes
Neural networks
46
Data representation 1: BOW
• Each word is one dimension
• Each document is one point on a hypersphere
47
Social media specific sentiment features
48
Data representation: Additional features
• BOW bag of words + additional features• Word N-grams: (Justin Bieber, video games, not happy)
• Punctuation:
• Emoticons and emoji:
• Preprocessing: baaaaaaad → baaad
• Capitalization: SCREAMING
• Language specific • Lists of positive and negative words: SentiWordNet
• Spellings of swearing: f**k
• Language (keyboard) specific emoticons: ಠ_ಠ , ƸӜƷ
49
Precision-recall tuning
• Precision & Recall should be similar for both the positive and the negative class
50
Deep learning methods
51
Multilingual sentiment analysis• Lo, S.L., Cambria, E., Chiong, R. and Cornforth, D., 2017. Multilingual sentiment analysis: from formal
to informal and scarce resource languages. Artificial Intelligence Review, 48(4), pp.499-527.
• Korayem, M., Aljadda, K. and Crandall, D., 2016. Sentiment/subjectivity analysis survey for languages other than English. Social network analysis and mining, 6(1), p.75.
52
NLP != English LP
53Image from https://fledu.uz
Languages in the world Languages on Twitter
Multilingual sentiment analysis approaches
A. Translation-based sentiment AnalysisB. Corpus basedC. Lexicon-based sentiment analysis D. Machine learning approachesE. Language independent approaches
54
Translation based sentiment analysis (2)
Original documentEnglish document Sentiment classification
Machine
translation
Apply English
sentiment
analysis
55
Translation based sentiment analysis (2)
Sentiment labeled corpus
(English)
Machine translate to
target language
Corpus in target language
Build a ML model
Original document
Sentiment model for
target language
Sentiment classification56
Corpus based
Parallel corpora
Apply “English”
sentiment model
Transfer
labels
Build sentiment
model for target
language
Sentiment model for
target language
57
Lexicon-based sentiment analysis
• Build a sentiment lexicon for target language• Translation of lexica (+ check 10.000 most frequent words)
• Word net (words and semantic relations) + seed words
58
Machine learning approaches
1. Labeled dataset• Manual annotation
• Distant supervision• Emoji/emoticon
• Positive and negative #tags
• Seed words
2. Build a machine learning model
59
Languages of rich morphology
60
(Abdul-Mageed, 2018)
Arabic
61
(Abdul-Mageed, 2018)62
(Abdul-Mageed, 2018)63
(Abdul-Mageed, 2018)64
(Abdul-Mageed, 2018)
Segmentation
65
(Abdul-Mageed, 2018)
POS Tagging
66
(Abdul-Mageed, 2018)
ASMA: Segmentation &
Morphosyntactic Disambiguation
ASMA: A Real-World Example
67
(Abdul-Mageed, 2018)
Modeling in lexical space
Modeling in morphosyntactic space
(Abdul-Mageed, 2015. Dissertation)68
(Abdul-Mageed, 2018)69
Resources & Venues
70
Sentiment Resources
• Lexicons
• Models & libraries
• Annotated sentiment data
71
Lexicons• AFINN
• Bing Liu's Opinion Lexicon
• MPQA Subjectivity Lexicon
• Harvard General Inquirer
• SentiWordNet
• Loughran-McDonald Sentiment Word Lists
• Sentiment Lexicons for 81 Languages
• Emoji sentiment ranking
• Emoticon Sentiment Lexicon
• Sifat (Arabic adjectives)
72
AFINN
• A list of English words rated for valence
• Scale [-5,5]
• 2477 words and phrases
• Licence: Open Database License (ODbL) v1.0
• An evaluation of the word list is available in:Finn Årup Nielsen"A new ANEW: Evaluation of a word list for sentiment analysis in microblogs",Proceedings of the ESWC2011 Workshop on 'Making Sense of Microposts':Big things come in small packages 718 in CEUR Workshop Proceedings : 93-98. 2011 May.http://arxiv.org/abs/1103.2903
73
Emoji sentiment ranking
• Sentiment of 751 (most common) emojis
• Constructed from manually sentiment labeled 75,000 tweets with emoji in 13 European languages
• Similar format to SentiWordNet
• Kralj Novak P, Smailović J, Sluban B, Mozetič I (2015) Sentiment of Emojis. PLoS ONE 10(12): e0144296. https://doi.org/10.1371/journal.pone.0144296
• http://kt.ijs.si/data/Emoji_sentiment_ranking/
74
Models
• TextBlob• PatternAnalyzer: based on a lexicon of adjectives
• NaiveBayesAnalyzer: a NLTK classifier trained on a movie reviews corpus
• (Python) https://textblob.readthedocs.io/en/dev/
• Ipubila sentiment analysis • English, German, French and Italian.
• (Python, REST) https://github.com/ipublia/sentiment-analysis
75
Annotated sentiment data
• Twitter sentiment for 15 European languages (1,643,735 manually annotated tweets)
• SemEval competition data• Bing Liu’s customer reviews and other datasets• Product reviews: this dataset consists of a few million Amazon customer reviews with
star ratings, super useful for training a sentiment analysis model.• Restaurant reviews: this dataset consists of 5,2 million Yelp reviews with star ratings.• Movie reviews: this dataset consists of 1,000 positive and 1,000 negative processed
reviews. It also provides 5,331 positive and 5,331 negative processed sentences / snippets.
• Fine food reviews: this dataset consists of ~500,000 food reviews from Amazon. It includes product and user information, ratings, and a plain text version of every review.
• Twitter airline sentiment on Kaggle: this dataset consists of ~15,000 labeled tweets (positive, neutral, and negative) about airlines.
• First GOP Debate Twitter Sentiment: this dataset consists of ~14,000 labeled tweets (positive, neutral, and negative) about the first GOP debate in 2016.
76
Emotion Resources
• Lexicons• NRC emotion lexicon• UBC emotion lexicon (ongoing work)
• Data
• SemEval 2007; 2018; 2019
• Aman and Szpakowicz (2007)
• Abdul-Mageed and Ungar (2017)
• Alhuzali, Abdul-Mageed, and Ungar (2018) (Arabic)
77
Biases & Ethics
78
Biases: Social media data is not representative
• Demographic differences between social media users and “target population”
• Behaviour biases
• Linking biases
• Temporal variations
79
Ethics
• Types of social media research
• Users publishing content might have not anticipated a particular use
Aware Not aware
Manipulated Lab studies A/B testing
Not manipulated Opt-in study Observational studiesSentiment
analysis
80
Ethics
• Private or public?• PRIVATE: a password protected ‘private’ Facebook group
• PUBLIC: an open discussion on Twitter in which people broadcast their opinions using a #tag (in order to associate their thoughts on a subject with others’ thoughts on the same subject)
• Public != Non-sensitive
Townsend, L. and Wallace, C., 2016. Social media research: A guide to ethics. University of
Aberdeen, pp.1-16.
81
Ethics: Case Study - Marihuana
• Twitter: #cannabis, #legalize, #ismokeit
• Concerns: • Sensitive: illegal activity
• May be users under the age of 18
• Solution:• Present results from aggregate data,
• Avoid compromising anonymity: paraphrased quotes (removing ID handles)
• Direct quotes may be used with informed consent from the platform (over 18) user.
82
Take-home messages
• On real data, human annotators disagree → hard problem
• The best classifier can not outperform the inter-annotator agreement
• Data representation• BOW + Social media specific features: punctuation, emojis, …
• Embedding + deep learning: need lots of data (unlabeled, distant supervision)
• NLP != English LP
83
ReferencesINCOMPLETE• Liu B. Sentiment analysis: mining opinions, sentiments, and emotions. The Cambridge University Press, 2015.
• Zhang, L., Wang, S., & Liu, B. (2018). Deep Learning for Sentiment Analysis: A Survey. arXiv preprint arXiv:1801.07883.
• Mohammad, S. M. Challenges in sentiment analysis. In A Practical Guide to Sentiment Analysis (pp. 61-83). Springer, Cham, 2017.
• Taboada, M. Sentiment Analysis: An Overview from Linguistics. Annual Review of Linguistics 2016 2:1, 325-347
• Abdul-Mageed, M. and Ungar, L., 2017. Emonet: Fine-grained emotion detection with gated recurrent neural networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 718-728).
• Mozetič, I., Grčar, M. and Smailović, J., 2016. Multilingual Twitter sentiment classification: The role of human annotators. PloS one, 11(5), p.e0155036.
• Zollo, F., Novak, P.K., Del Vicario, M., Bessi, A., Mozetič, I., Scala, A., Caldarelli, G. and Quattrociocchi, W., 2015. Emotional dynamics in the age of misinformation. PloS one, 10(9), p.e0138740.
• Zollo, F., Sluban, B., Mozetič, I. and Quattrociocchi, W., 2017, November. Toward a Better Understanding of Emotional Dynamics on Facebook. In International Workshop on Complex Networks and their Applications (pp. 365-377). Springer, Cham.
• Kralj Novak, P. , Smailović, J., Sluban, B., & Mozetič, I. (2015). Sentiment of emojis. PloS one, 10(12), e0144296.
Multilingual:
• Lo, S.L., Cambria, E., Chiong, R. and Cornforth, D., 2017. Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artificial Intelligence Review, 48(4), pp.499-527.
• Korayem, M., Aljadda, K. and Crandall, D., 2016. Sentiment/subjectivity analysis survey for languages other than English. Social network analysis and mining, 6(1), p.75.
• Abdul-Mageed, M., Diab, M. and Kübler, S., 2014. SAMAR: Subjectivity and sentiment analysis for Arabic social media. Computer Speech & Language, 28(1), pp.20-37.
Ethics:
• Townsend, L. and Wallace, C., 2016. Social media research: A guide to ethics. University of Aberdeen, pp.1-16.
84
Muhammad Abdul-Mageed
Natural Language Processing Lab
School of Information
The University of British Columbia
Vancouver, Canada
(Abdul-Mageed & Kralj Novak, 2018)
Petra Kralj Novak
Department of Knowledge Technologies
Jožef Stefan Institute
Ljubljana, Slovenia
@PetraKraljNovak