text mining - wissensrohstoff...

58
Institut für Informatik Text Mining - Wissensrohstoff Text Gerhard Heyer Universität Leipzig [email protected]

Upload: others

Post on 23-Sep-2019

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Institut für Informatik

Text Mining -

Wissensrohstoff Text

Gerhard Heyer

Universität Leipzig [email protected]

Page 2: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

2

Kommunikationstheoretischer Hintergrund

Page 3: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Das Organonmodell von Bühler (Sprachtheorie)

• Grundlage sprachlicher Kommunikation sind sprachliche

Ausdrücke

• Sprachliche Ausdrücke haben drei Dimensionen:

– den Sender (Sprecher, Schreiber)

– den Empfänger (Hörer, Leser)

– die referenzierte Sache (Objekte und Ereignisse, Eigenschaften,

Tatsachen, …)

• In Bezug auf einen Sender, (intendierten) Empfänger und die

referenzierte Sache haben sprachliche Ausdrücke daher eine

dreifache Funktion:

– Symptom

– Apell

– Symbol

3 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 4: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Organonmodell – Schematische Darstellung

4 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 5: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Linguistische Aspekte

• Wir finden daher sprachliche Ausdrücke in Form von

– Ausrufen (Symptom)

– Bewertungen (Apell)

– Aussagen (Symbol)

• Beispiel

5 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

ECKHARD BERGER (stockend)

Ich weiß nicht, was passieren wird… aber ich habe

Angst… Angst vor meinen Kollegen: Jürgen

Wiesehöfer… Michael Nauen… und Sven Lienecke.

Wenn mir etwas zustößt, dann… (eine quälende

Pause, dann) diese drei Männer sind gefährlich…

(leise) möglicherweise Mörder.

Drehbuch SoKo

Leipzig Folge 6,

2004

Page 6: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Weiteres Beispiel

6 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 7: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Aufgaben

• Identifizieren von sentiment targets

• Identifizieren von sentiment expressions (Wörter, Sätze) und

deren Modifikatoren

• Berechnung eines sentiment index für sentiment expressions

• Berechnung eines sentiment index für komplexe sentiment

expressions (Abschnitte, Texte)

• Identifizieren und parametrisieren von Einflussfaktoren bei der

Interpretation von sentiment expressions, z. B.

– Kontext

(Fachdomäne, Interessen des Bewerters, Perspektive, …)

– Medium

(soziale Netzwerke, email, Film, …)

– Sprachregister

(Höflichkeit, Kanalrestriktionen, Kompetenzrestriktionen, …)

7 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 8: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Anwendungen

• Text und Interview basiertes Marketing (Markenbildung und –

veränderung, Kundenerwartungen)

• Text basierte Marktanalysen

• Text und Interview basierte sozialwissenschaftliche

Erhebungen (European Social Science Monitor)

• Wichtige Ergänzung zur Analyse von Trends und sozialen

Netzwerken

• eHumanities

8 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 9: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Projekte und Werkzeuge

9

Page 10: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Projekte und Werkzeuge

Page 11: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

News stories collected in realtime

Reuters Sentiment Analysis Workflow

Stories are standardised

Linguistic analysis performed on each story

to produce sentiment

Word sense disambiguation performed on

each story

Sentiment feature vector produced to

describe each document

Feature vector matched against machine

learning vector in order to classify story

sentiment

Analysis results delivered to clients

Page 12: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Projekte und Werkzeuge

Page 13: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Projekte und Werkzeuge

Page 14: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Live News FeedsThousands of news stories collected

daily

Data is standardised with timestamps,

headlines, text...News stories serialised to storage

devices.

Stories are machine read and analysed

to detech sentmient

Results are used to produce time-series

trends

Overall results are analysed for trends,

repeating patterns and algorithmic

patterns.

Analysed data sent to clients

Dow Jones/RavenPack Sentiment Analysis Workflow

Page 15: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

15

Sentiment Analyse

Die nachfolgenden Folien basieren

auf Folien von Robert Remus und

Khurshid Ahmad

Page 16: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Einführung

Einführung --- Definition Sentiment Analysis

Sentiment Analysis refers to a broad area of NLP, CL

and TM. Generally speaking, it aims to determine the

attitude of a speaker or a writer with respect to some

topic. The attitude may be their judgment or

evaluation, their affective state or their intended

emotional communication.

http://en.wikipedia.org/wiki/Sentimentanalysis

16 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 17: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Überblick --- Literaturlandschaft

Was umfasst der Begriff der Sentiment Analysis und das oft

synonym gebrauchte Opinion Mining in der Literatur?

Subjectivity Analysis:

Hat eine textuelle Einheit bspw. ein Wort, eine Phrase, ein Satz, ein

Dokument einen subjektiven oder objektiven Charakter?

Polarity Analysis:

Bringt eine textuelle Einheit eine positive, negative oder neutrale

Stimmung zum Ausdruck?

Beide Fragestellungen werden vornehmlich als Instanzen eines

(Text-)Klassifikationsproblems angesehen

17 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 18: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Subjectivity Analysis

Subjectivity Analysis

Einflussreichste Arbeiten:

Wiebe et al. (2004), Wiebe & Riloff (2005)

Diese klassifizieren Sätze u.a. anhand sog. Subjectivity Clues:

„Ich glaube, die Qualität ist minderwertig.“

Wiebe et al. (2004) „lernen“ solche Ich-Du-Kookkurrenzen und

andere subjectivity clues, bspw. niederfrequente Wortformen,

aus einem großen Korpus

18 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 19: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Polarity Analysis

Polarity Analysis --- Typische Verfahren I

Frühe einflussreiche Studie: Pang et al. (2002)

Diese klassifiziert kurze Texte mittels typischer statistischer

Verfahren, u.a. Support Vector Machines, die auf einer hand-

annotierten Trainingsmenge angelernt wurden.

Regelbasierte Ansätze, bspw. Kennedy & Inkpen (2006) suchen in

Sätzen nach polaren Wörtern und beziehen Modifikatoren wie

Negationen, Abschwächungen und Verstärkungen in die

Analyse ein:

„Das ist keine schöne Vorstellung“

19 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 20: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Polarity Analysis

Polarity Analysis --- Typische Verfahren II

Nasukawa & Yi (2003):

[Polarity] analysis involves identification of

- sentiment expressions,

- polarity strength of the expressions, and

- their relationship to the subject.

Die ersten, die den Begriff der Sentiment Analysis in dieser

Form verwendeten, waren [Nasukawa & Yi 2003]

20 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 21: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Polarity Analysis

Polarity Analysis --- Benötigte Ressourcen

Regelbasierte Studien benötigen Wörterbücher, die positiv und

negativ konnotierte Termini aufführen.

Solche Ressourcen sind a priori nicht für alle Sprachen frei

verfügbar. Wie können wir sie erstellen?

- manuelle Auflistung

- Transfer bereits existierender fremdsprachlicher Ressourcen

- automatisches Lernen, bspw. durch Bootstrapping

21 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 22: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Lexikalische Ressourcen

22 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

‘Modern’ day dictionaries of affect:

• emotion as dimension and

• emotion as ‘finite category’

— good–bad axis: termed the dimension of valence,

evaluation or pleasantness

— active–passive axis (termed the dimension of arousal,

activation or intensity)

— strong–weak axis (termed the dimension of dominance

or submissiveness)

Page 23: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Dictionaries of Affect

‘Modern’ day dictionaries of affect are used in

computing the frequency of sentiment words in a text

and the attempt usually is ensure that one picks up

sentences that pick up the ‘correct’/unambiguous

sense of the sentiment word

— General Inquirer [Stone et al. 1966];

— Dictionary of Affect [Whissell 1989];

— WordNet Affect [Strappavara and Valitutti 2004];

— SentiWordNet [Esuli and Sebastiani 2006].

23 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 24: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

General Inquirer

The General Inquirer is a software system for analysing texts for

ascertaining the psychological attitude/orientation/behaviour of

the writer of a text as implicit in his or her writing.

The system has a large database of words and each word is

tagged primarily in terms of whether the word is generally used

positively or negatively.

But there are many fine gradations within the tags – ranging

from tags to describe active/passive orientation and whether

the word belongs to a specific subject category like economics,

or that the word is used usually by academics or found in legal

documents

24 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 25: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

General Inquirer Categories

25

Name No. of Words

Meaning

Positiv 1,915 positive outlook.

Negativ 2,291 negative outlook

Pstv 1045 positive outlook

Affil 557 affiliation or supportiveness.

Ngtv 1160 Negative outlook

Hostile 833 an attitude or concern with hostility or aggressiveness

Strong 1902 implying strength

Power 689 Positive

Hostile 833 concern with hostility or aggressiveness

Weak 755 Negative

Submit 284 submission to authority or power, dependence on others, vulnerability to others, or withdrawal.

Page 26: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

SentiWS

SentiWS [Remus et. al. 2010]

SentiWS kurz für SentimentWortschatz ist ein

deutschsprachiges Wörterbuch

- führt 1650 positiv und 1818 negativ konnotierte Wörter auf

- gibt ihre Wortart und ihre Flexionsformen an

- gewichtet jeden Eintrag bzgl. seiner Ausdrucksstärke im

Intervall [-1, +1]

Seit Juni 2010 ist SentiWS frei verfügbar unter

http://wortschatz.informatik.uni-leipzig.de/download/

26 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 27: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

SentiWS

SentiWS --- Quellen

SentiWS fußt auf einer Studie, die sich mit den Wechselwirkungen

zwischen Stimmungen in Zeitungstexten sowie Blogposts und den

Bewegungen im DAX 30 auseinandersetzt (Remus et al. 2009)

Die Kategorien Positiv und Negativ des englischsprachigen General

Inquirer wurden per Google Translate automatisch übersetzt,

domänenspezifische Begriffe à la Finanzkrise wurden hinzugefügt

Erweitert wird SentiWS durch signifikante Kookkurrenzen in

Kundenrezensionen.

Weiterhin erweitert wird SentiWS durch Überlappungen der bereits

identifizierten Begriffe mit semantischen Gruppen des Deutschen

Kollokationswörterbuchs (Quasthoff 2010)

27 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 28: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

SentiWS

SentiWS --- Gewichtung I

Halbüberwachte Gewichtung der semantischen Orientierung

eines Wortes

mit Pwords einer Menge positiv konnotierter Wörter und

Nwords einer Menge negativ konnotierter Wörter

w wird mit einer positiven semantischen Orientierung markiert,

wenn SO-A(w) positiv ist und mit einer negativen semantischen

Orientierung, wenn SO-A(w) negativ ist. Der absolute Wert von

SO-A(w) zeigt die Stärke der semantischen Orientierung.

28

Pwordspword Nwordsnword

nwordwApwordwAwASO ),(),()(

Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 29: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

SentiWS

SentiWS --- Gewichtung II

A(w1, w2) wird durch Pointwise Mutual Information (PMI)

bestimmt:

Paradigmen für seed words nach [Turney Littmann, 2003]

(übersetzt):

Pwords = gut, schön, richtig, glücklich, erstklassig, positiv,

großartig, ausgezeichnet, lieb, exzellent, phantastisch

Nwords = schlecht, unschön, falsch, unglücklich, zweitklassig,

negativ, scheiße, minderwertig, böse, armselig, mies

29

)()(

)&(log),(

21

21221

wpwp

wwpwwPMI

Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 30: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

SentiWS

SentiWS --- Gewichtung III

Beispiele einiger Wörter inkl. der Gewichtung ihrer Ausdrucksstärke

bzw. Polarität (allgemeiner Wortschatz bzw. Automobilforen):

Wort Gewichtung Wort Gewichtung

30

Panne - 0,9010

Schaden - 0,5299

fehlerhaft - 0,3581

Vertrauen +0,3512

Zufriedenheit +0,2207

hervorragend +0,5891

Schulden - 0.8905

betrügen - 0.8368

freundlich +0.9273

Freude +1,0000

Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 31: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

SentiWS

SentiWS --- Anwendungen

• Analyse von Automobilforen und –blogs zusammen mit der Daimler

Forschung in Ulm (Folien von R.Remus)

• Einfache Polaritätsanalyse am Beispiel einer Auswertung von

Tiefeninterviews im Marketing (zusammen mit Uni HH)

31 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 32: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

SentiWS

SentiWS --- Evaluation

Eine Evaluation der Gewichtungen gestaltet sich schwierig. Warum?

Beispiel:

Zufällige Auswahl von je 5 positiv und negativ konnotierten Wörtern

7 Probanden wurden dazu aufgefordert, zwei Rangfolgen zu

bestimmen, die die Worte ihrer Ausdrucksstärke nach ordnen.

Die gemessene Übereinstimmung (Cohens κ) zwischen den

Rangfolgen der Probanden ist 0,314 .

Cohens Kappa ist ein Maß für die Reliabilität von Annotationen

mehrerer Annotatoren -- drückt also die Abweichung von der zufällig

erwarteten Übereinstimmung zwischen ihnen aus.

Ein Wert von 0,314 wird in der Literatur (bspw. [Landis Koch, 1977])

als sehr geringe Übereinstimmung angesehen.

32 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 33: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

SentiWS

SentiWS --- Evaluation

Folgerung:

Da es Menschen schwer fällt, einheitliche Rangfolgen

festzulegen, ist es schwer, einen Goldstandard zur Evaluation

von Sentiments zu definieren.

Eine unterschiedliche Gewichtung erscheint dennoch intuitiv

(vgl. Ausdrucksstärke „unklug“ vs. „bescheuert“) … …

33 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 34: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

34

Praxisbeispiel: Analyse von Kundenerwartungen

Vgl. Torsten Teichert, Gerhard Heyer, Katja

Schöntag und Patrick Mairif: Co-Word Analysis

for assessing consumer associations: A case

study in market research. In: Affective

Computing and Sentiment Analysis, Springer

Science+Business Media B.V. , 2011

Page 35: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Background

• Interviews for specific marketing tasks

• Manually rated and evaluated

– Concept features (metaphors)

– Emotional rating

– Clustering of concepts

• For each interview, features are manually counted

and fed into SPSS for factorial analysis

• Purpose of the present work: Test and evaluate the

efficiency of NLP for detecting and clustering

concept features

35

Page 36: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

36

• Goal of marketing: gain insight into consumers’

thoughts and feelings regarding specific brands and

products

• Widespread use of elicitation and analysis techniques

• As opposed to many text analysis applications, data

are not obtained from secondary (internet) sources

but from 30 personal in-depth interviews with female

consumers

• However, interviews yield a large amount of qualitative

data that is hard to handle and needs to be structured

in order to be analyzed

Sentiment analysis in marketing

Page 37: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

37

• Product categories are assumed to be emotionally

laden reaching far beyond mere functional aspects

• Data elicitation and processing techniques are based

on methods derived from human associative memory

models and network analysis

• Human Associative Memory (HAM) is a widely

accepted model with an increasing number of studies

based upon it

– information is stored in nodes which are linked (associated) with

each other forming a complex network of associations

– mental activity spreads from active concepts to all related concepts

Methods and assumptions

Page 38: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

38

• In the case of brands, the stimulating element can be a brand’s logo: the brand’s associative network is activated and becomes accessible and retrievable

• Activation then spreads to adjacent nodes

• This spread of activation produces a chain, or flow, of thoughts

• A representation of this flow of thoughts can be obtained from the flow of speech, for example when eliciting brand or product associations during an interview.

• Elicitation techniques help accessing subconscious memory of episodic, autobiographic, visual and sensory nature as well as a metaphoric description of thoughts, sentiments, and emotions

HAM in marketing

Page 39: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

39

• Researchers manually interpret the

interviewees’ statements

– Ambiguity of statements and expressions

– Subjective rating of the elicited data results

• Low replicability of the results

• By convention, inter-rater-reliabilities of 70

percent and above are acceptable

• Inter-rater-reliability is comparatively low for

emotional aspects as opposed to more

rational expressions

Problems of evaluating questionnaires

Page 40: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

40

• Text analysis tools offer a solution to this

problem

• Reduce the level of subjectivity (to a minimum)

– feature extraction

– categorization processes

• Raise replicability level

• Reach higher level of reliability

• The concept of Human Associative Memory

guides the data processing and evaluation

process

Goals and the role of NLP

Page 41: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

41

Goals and the role of NLP

Four main assumptions:

1. Words or concepts mentioned together are linked in the mind.

2. The more salient a concept is, the more often it is mentioned during the course of an interview.

3. The stronger the association between two concepts, the more often they are mentioned together.

4. Valence of a concept is indicated by positive or negative

Page 42: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

42

1. Extraction of features and consolidation of

extracted features into meaningful categories.

2. Processing of the data using a co-word-

analysis on a paragraph level as basis for the

development of associative networks.

3. Consideration of valence expressions for the

weighting of individual features.

Specific requirements

Page 43: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

43

Architecture

Definition of text sources

Extraction of features and definition of value

concepts within sentences

Base form per feature

(Lemmatisation)

Adding

synonymvectors to

features

Clustering of features

with similar

synonymvectors

Statistical processing of clusters

Frequency information per

feature

Clusters of features

Base forms

Stop words

Synonyms

Value

concepts

Update Update

Page 44: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

44

Architecture

Definition of text sources

Extraction of features and definition of value

concepts within sentences

Base form per feature

(Lemmatisation)

Adding

synonymvectors to

features

Clustering of features

with similar

synonymvectors

Statistical processing of clusters

Frequency information per

feature

Clusters of features

Base forms

Stop words

Synonyms

Value

concepts

Update Update

Synonyms are computed

as similarity of global

context of word forms

Page 45: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

45

Architecture

Definition of text sources

Extraction of features and definition of value

concepts within sentences

Base form per feature

(Lemmatisation)

Adding

synonymvectors to

features

Clustering of features

with similar

synonymvectors

Statistical processing of clusters

Frequency information per

feature

Clusters of features

Base forms

Stop words

Synonyms

Value

concepts

Update Update

Clustering with graph

based methods

(Chinese whispers

algorithms)

Page 46: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

46

Example: World of shoes

Page 47: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

47

Example: World of shoes

Page 48: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

48

Example: Absolute counts

Page 49: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

49

Example: Synonyms (similar contexts)

Page 50: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

50

Example: Clusters of similar features

Page 51: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

51

Processing of clusters

• For each piece of text, the occurrence and

co-occurrence of clusters of similar features

is counted

• For each piece of text, a factorial analysis is

carried out

• The result is visualized using NetDraw

Page 52: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

52

Example: Final coneptual graph for shoes

Page 53: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

53

Example: Specific findings

• The product category of shoes activates a number of

highly emotional associations in the female

consumers’ minds.

• The purchasing process (marked in blue):

– “satisfy/please”, “wear/try on”, “spend time”, “discover”,

“examine”, “watch/perceive”, “satisfaction/gratification”, “enjoy”,

and “bliss.”

– Simply put: the process of selecting and buying shoes makes

female consumers happy and gives them a feeling of deep

satisfaction.

• Service quality and store ambience can therefore be

strong differentiating factors for a shoe or shoe

store brand.

Page 54: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

54

Conclusions

• A total of 1,938 different features could be

extracted from the transcribed interviews.

• Manual coding resulted in 133 and 112

categories for the two raters respectively.

• Inter-rater-reliability was 65.3 percent

– inter-rater-reliability was 60.6 percent for emotional

aspects

– while for rational aspects, it was 66.7 percent.

Page 55: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

55

Conclusions – The impact of NLP

• The automatic categorization resulted in 185 categories or clusters

• 100 of the 148 manually developed categories, i.e. 67.6 percent, were identical or similar to the automatically developed categories

• Results are comparable to manual coding

• But take only a fraction of time and effort

• The network representation of the main concepts offers a quick yet comprehensive overview of the complete pool of concepts

Page 56: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Literatur

Church, K. W. & Hanks, P. (1990). Word Association Norms, Mutual Information, and Lexicography. Computational Linguistics, 16(1), 22--29. Kennedy, A. & Inkpen, D. (2006). Sentiment Classification of Movie Reviews Using Contextual Valence Shifters. Computational Intelligence, 22(2), 110--125. Nasukawa, T. & Yi, J. (2003). Sentiment Analysis: Capturing Favorability Using Natural Language Processing. In Proceedings of the 2nd International Conference on Knowledge Capture (pp. 70--77). Pang, B., Lee, L., Vaithyanathan, S. (2002). Thumbs up? Sentiment Classification using Machine Learning Techniques. In Proceedings of the 40th Annual Meeting of the ACL (pp. 79--86).

Quasthoff, U. (2010). Deutsches Kollokationswörterbuch. Berlin, New York: deGruyter.

56

Page 57: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Literatur

Remus, R., Ahmad, K., Heyer, G. (2009). Sentiment in German-language News and Blogs, and the DAX. In Proceedings of the Conference on Text Mining Services (TMS), Ausgabe XIV of Leipziger Beiträge zur Informatik (pp. 149--158). Remus, R., Quasthoff, U., Heyer, G. (2010). SentiWS -- a German-language Resource for Sentiment Analysis. In Proceedings of LREC 2010.

Torsten Teichert, Gerhard Heyer, Katja Schöntag und Patrick Mairif: Co-Word Analysis for assessing consumer associations: A case study in market research. In: Affective Computing and Sentiment Analysis, Springer Science+Business Media B.V., 2011 Turney, P. (2002). Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proceedings of the 40th Annual Meeting of the ACL (pp. 417--424).

57 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text

Page 58: Text Mining - Wissensrohstoff Textasv.informatik.uni-leipzig.de/uploads/document/file_link/462/TM13_Sti... · • As opposed to many text analysis applications, data are not obtained

Stimmungsanalyse

Literatur

Turney, P. & Littman, M. (2003). Measuring Praise and Criticism:

Inference of Semantic Orientation from Association. ACM

Transactions on Information Systems (TOIS), 21(4), 315--346.

Wiebe, J. Riloff, E. (2005). Creating Subjective and Objective

Sentence Classifiers from Unannotated Texts. In Proceedings

of the Sixth International Conference on Intelligent Text

Processing and Computational Linguistics (CICLing), pp. 486--

497.

Wiebe, J., Wilson, T., Bruce, R., Bell, M., Martin, M. (2004).

Learning Subjective Language. Computational Linguistics,

30(3), 277--308.

58 Prof. Dr. G. Heyer Text Mining – Wissensrohstoff Text