multimodal dialogue analysis

50
Multimodal Dialogue Analysis INOUE, Masashi Yamagata University 29-Nov-09 @FIU Dr. Tao Li’s Group

Upload: curry

Post on 10-Feb-2016

73 views

Category:

Documents


5 download

DESCRIPTION

Multimodal Dialogue Analysis. INOUE, Masashi Yamagata University. 29-Nov-09 @FIU Dr. Tao Li’s Group. Name of the discipline. Computational Social Linguistics Society influences language use Conversation Analysis (CA) Discourse Analysis (DA). Overview (1/5). Layers of investigation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multimodal Dialogue Analysis

Multimodal Dialogue Analysis

INOUE, Masashi

Yamagata University

29-Nov-09 @FIU Dr. Tao Li’s Group

Page 2: Multimodal Dialogue Analysis

Name of the discipline

• Computational Social Linguistics– Society influences language use

• Conversation Analysis (CA)• Discourse Analysis (DA)

2

Page 3: Multimodal Dialogue Analysis

OVERVIEW (1/5)

3

Page 4: Multimodal Dialogue Analysis

Layers of investigation

Data •Sensing (Objective)•Device development and signal processing

Information •Event detection (Ambiguous)•Pattern recognition

Knowledge •Pattern Discovery (Subjective)•Data mining

4

Page 5: Multimodal Dialogue Analysis

Major Conferences and Journals

• ICMI-MLMI– ICMI (User Interface) and MLMI (Dialogue

Analysis) merged in 2009

• Some in multimedia or NLP conferences– ACM Multimedia– ACL– etc.

5

Page 6: Multimodal Dialogue Analysis

Research Initiatives In Europe

• CHIL Corpus• AMI Corpus

– Augmented multi-party interaction– http://corpus.amiproject.org/

• SSPNET– A European network of excellence in social signal

processing– http://sspnet.eu/

6

Page 7: Multimodal Dialogue Analysis

FIRST EXAMPLE (2/5)

8

Page 8: Multimodal Dialogue Analysis

Paper 1 (ICMI-MLMI 2009)

• "Discovering group nonverbal conversational patterns with topics” by Dinesh Babu Jayagopi, Daniel Gatica-Pere (IDIAP)

• Goal: Understand group dynamics (= leadership) from conversational video

9

Page 9: Multimodal Dialogue Analysis

Method

• Feature descriptor– Time slices of conversation (documents)

• different time scale shows different patterns– 1 min scale – monologue vs. 5 min scale - a lot of interaction

– Speaking energy/Speaking status• Bag of non-verbal patterns (NVP)

– speech length, # of turns, successful interruptions• Method (what’s new)

– Unsupervised– Topic model (LDA) – which feature is prominent

10

Page 10: Multimodal Dialogue Analysis

Feature categories

1. Generic group patterns: group as a whole– silence, one-speaker, two-speaker, other, evenly

2. Leadership patterns: – proposed in social psychology field– position of designated leader (‘L’) or someone else

(‘NL’): taking maximum values• 21 dimensional feature vectors (vocabulary)• 6 tokens per slice (words)

11

Page 11: Multimodal Dialogue Analysis

12

Page 12: Multimodal Dialogue Analysis

Data

• AMI Corpus– Meeting for product design– 17 meetings (17 hours)– 4 participants / group:

• ‘Project Manager’, ‘User Interface specialist’, ‘Marketing Expert’, and ‘Industrial Designer’.

13

Page 13: Multimodal Dialogue Analysis

Result (3 topics)

14

Page 14: Multimodal Dialogue Analysis

Result (visual)

15Can be used to characterize groups

3 topics

Page 15: Multimodal Dialogue Analysis

Validation

• Comparison with ground-truth(GT): – 5 min scale, 8 top docs per class– 3 annotator / meeting – GT is majority agreed

• Accuracy: 62%, 100%, 75% for each class– Autocratic, Participative, Free rein

16

Page 16: Multimodal Dialogue Analysis

Questions

• Feature representation (Are they good? )– Some magic numbers (e.g., 6 words/slice)– Balancing #of vocabulary and # of words

• Modeling technique (Is LDA a valid one?) – Can we regard the NVC as words and Group

Dynamics as topics? – Arbitrary number of topics, different

interpretation

17

Page 17: Multimodal Dialogue Analysis

EXAMPLE 2 (3/5)

18

Page 18: Multimodal Dialogue Analysis

Paper 2 (MSSSC 2009)

• "Sensor-Based Organizational Engineering” by Daniel Olguin-Olguin, Alex (Sandy) Pentland (MIT Media Lab)

• [16] Olguin-Olguin, D., & Pentland, A. (2008). Social Sensors for Automatic Data Collection. 14th Americas Conference on Information

• Social signals/Reality mining/Sensible organizations– Introduction to their research projects– Use of sensors to collect data in groups– Combination of textual and survey data– Business communication domain (organizational

behavior)

19

Page 19: Multimodal Dialogue Analysis

Method

• Sensor data– Face/body/vocal behavior/space and

environment/affective behavior– camera infrared sensors, accelerometer,

gyroscopes, inclinometers, cameras, pressure sensors, microphones, cameras, vibration,...

• Pattern recognition• Social network analysis

– Who talks to who– How well they are communicating

20

Page 20: Multimodal Dialogue Analysis

Case 1

• Communication in a call center– wearable sensor devices (sociometric badge)– completion time difference (productivity)– 2,200 hours of data (100 hours per employee) and

880 reciprocal e-mails • Findings

– more interaction implied lower productivity– higher variance in physical activity implies lower

productivity

21

Page 21: Multimodal Dialogue Analysis

Case 2

• Communication in a marketing division– face-to-face vs. emails– questionnaire (satisfaction)

• Findings: – Total comm = email + face-to-face– Total comm negatively correlate with satisfaction

22

Page 22: Multimodal Dialogue Analysis

Questions

• Evaluation– Some domains do no have clear definition of

good/bad conversation• Interestingness

– High proximity -> low email usage• Implementation

– management practices for productivity improvements, customer satisfaction, and a better competitive position

24

Page 23: Multimodal Dialogue Analysis

OVERVIEW OF OUR PROJECT (3/5)

25

Page 24: Multimodal Dialogue Analysis

Pattern discovery from dialogue

• Goal: Finding recurring events or event sequences in human face-to-face dialogues.

• Why?: Human communication skills are often experience or assumption-based. – Enable smooth communication– Prevent problematic communication

• Task: Identify plausible hypotheses by machines that human cannot notice by observation

26

Page 25: Multimodal Dialogue Analysis

Target dialogue

• Psychotherapeutic Interview (Counseling)– Counseling at schools– Counseling at hospitals

• Increasing demand for therapists• Shortage of qualified teachers• Lack of effective training methods

• Therapist training setting (non-experimental)

27

Page 26: Multimodal Dialogue Analysis

Our Corpus (Private)

• Psychotherapeutic interview (counseling) – Training opportunity for students

• 25 dialogues (approx. 2 hrs each, 21 hrs in total)

• Adding more dialogues (3/year)

30

Page 27: Multimodal Dialogue Analysis

Recording and data format

Video Data

Single CameraTwo microphonesAVI -> MPEG

Priority: minimize disturbance for participants

Transcript

Annotation

31

Page 28: Multimodal Dialogue Analysis

Multimodality

• Verbal cue is dominant in defining meanings (textual information)

• What are the impact of non verbal cues such as gestures, eye-gaze, styles, timing, or context including social background?

32

Page 29: Multimodal Dialogue Analysis

33

Page 30: Multimodal Dialogue Analysis

Can gestures indicate misunderstandings?

• “Prediction of Misunderstanding from Gesture Patterns in Psychotherapy”, M. Inoue, R. Hanada, N. Furuyama, NII-2009-001E, Feb. 2009

• Negative result– We should rely on verbal content

34

Page 31: Multimodal Dialogue Analysis

Gestural Feature for Th & Cl

• Before/During/After the misunderstanding• 5/10/50 sec. windows

• Frequency (x1; x2; x3)• Frequency Difference (x4; x5)• Duration (Mean & Max & Min) (x6; x7; x8)• Mean Interval (x9)

35

Page 32: Multimodal Dialogue Analysis

Predictability by gestural cues

• Classification by linear discriminate analysis– Is there any feature that have similar

precision/recall tendency over different dialogues?

36

P

R

1 2

3

P

R

1

2

3

Dialogue 1 Dialogue 2

Page 33: Multimodal Dialogue Analysis

SPEECH-GESTURE INTERACTION (4/5)

37

Page 34: Multimodal Dialogue Analysis

Analysis of speech type patterns

– Understand how therapists speak words to their clients based on speech type transition patterns

38

1. Closed question e.g., :”Do you mean ~?”2. Open question e.g., “Can you elaborate that?”3. Encouragement/Repeat e.g., “Go on.” “I see.”4. Rephrase e.g., “So, you are thinking ~.”5. Reflection of emotion6. Reflection of meaning7. Other

A taxonomy used in counseling domain

Page 35: Multimodal Dialogue Analysis

Relationship between speech and gesture

• Frequencies of speech types• At the beginning or the end of dialogues

• How do speech patterns look differently when gestures are taken into consideration?

• Speeches that co-occur with gesturesVS

Speeches without gestures

• Do above division leads to any changes in the speech type transition patterns?

39

Page 36: Multimodal Dialogue Analysis

Generic encouragement

Sequences beginning from questioning

Co-gesture

Non-Co-Gesture 40

Speech type transition in the beginning of the dialogue

Page 37: Multimodal Dialogue Analysis

Speech type transition in the ending part of the dialogue

Sequence beginning from encouragement

Sequence beginning from question

Question and rephrase

Co-Gesture

Non Co-Gesture 41

Page 38: Multimodal Dialogue Analysis

Speech type transition in the ending part of the dialogue(Beginner therapist)

Co-Gesture

Non Co-Gesture

Reflection of therapists’ skill?

42

Page 39: Multimodal Dialogue Analysis

Summary

• Various speech sequence patterns can be interpreted as the techniques in dialogues.

• Patterns could be better understood when multimodality is taken into account.

• Discovered patterns could be used to assess the proficiency of therapist.

43

Page 40: Multimodal Dialogue Analysis

VERBAL CONTENT MISMATCH (5/5)

44

Page 41: Multimodal Dialogue Analysis

Mismatch between intension and perception over an utterance

• Therapists (Th) want to empower clients (Cl) by compliments.

• Clients want to be empowered by Th through their compliments.

• They share the same goal but this process dos not goes well in reality. – Th tried compliment but Cl did not notice it – Some complimentary expression are

uncomfortable to Cls– Th cannot figure out how Cls are praised 45

Page 42: Multimodal Dialogue Analysis

Compliment as a counseling technique

• Therapists learn the concept and necessity of compliment through lectures, but– There is not enough analysis of failures. – Concrete examples of expression are scarce.

As a result• Inexperienced Th cannot succeed in using

compliment techniques in the actual interview occasions very often.

46

Page 43: Multimodal Dialogue Analysis

Analysis approach

• How there happen mismatches in terms of vocabulary. – The focus is on what Ths say rather than how they say.

• How the intention and perception are different over the word usage– Timing of the utterance are ignored.

• To understand the generic tendency, multiple dialogues are mixed together into a word pool.

47

Page 44: Multimodal Dialogue Analysis

Data preparation

• Transcripts based on the videos of psychotherapeutic interviews (13 pairs, 27 participants)

• They are assigned to the participants. • Both Th and Cl highlights Th’s speech where

Th conducted compliment (Th) or Cl was empowered (Cl).

• Highlighted speeches are extracted and put into the word pool.

48

Page 45: Multimodal Dialogue Analysis

Degree of discrepancy

• Number of highlighted speech by therapists: – 114 (M=8.1)

• Number of highlighted speech by clients:– 69(M=4.6)

• Agreement:– 6%(11/183)

Th marked(114)

Cl marked(69)

Both marked (11)

49

Page 46: Multimodal Dialogue Analysis

Pre-processing

• Morphological analysis• Replacement of words (fluctuation, removal of

proper nouns for anonymity) • Number of tokens: 4250• Removal of low frequent (tf<2) or single

document (df<2) words focusing on the generic (cross-dialogue) expressions

• Number of vocabulary: 476 -> 113

50

Page 47: Multimodal Dialogue Analysis

Frequent wordsOverall Therapist Client

Word TF Word TF Word TF

Say 64 Say 22 Say 42Think 45 Thing 19 Very 30Something 42 Think 18 Role 28Role 41 That 18 Think 27Very 40 Something 15 Something 27Thing 39 Role 13 Well 25Well 38 Well 13 Do 22That 36 Do 11 Like this 22Do 33 Great 10 Not 21Like this 31 Not 9 Thing 20

51

Page 48: Multimodal Dialogue Analysis

0 20 40 60 80 100 1200

10

20

30

40

50

60

70

totalthcl

Eliminate high frequency wordsfrequency

word id

threshold 52

Page 49: Multimodal Dialogue Analysis

Mid frequency wordsOverall Therapist Client

Word Tf Word Tf Word Tf

Feeling 16 Feeling 7 Now 13Now 16 Say 6 Hmm 11How 13 Talk 6 Story 10Story 13 How 5 Feeling 9Hmm 12 Become 5 Yes 8Listen 12 Thing 5 How 8Yes 10 Tough 5 Listen 8Then 10 Listen 4 Think 8So 10 Absent 4 I 8Think 10 Hard 4 Enter 8

53

Page 50: Multimodal Dialogue Analysis

Summary• Problem: Compliment used by therapists (Th) during counseling are not

well accepted by clients (Cl).

• Data: 13 dialogue transcripts; utterances where Th intended compliment technique and Cl feel empowered by compliment are marked.

• Analysis: To understand the mismatch in vocabulary level, differences in usage are explored in terms of frequency. – Th tend to use compliment technique to focus on the difficulties of the problem. – Cl may be empowered by the words referring internal mental status.

• Future direction: Understanding resolving process of mismatches taking the difference in proficiency of therapists and dialogue topics into account.

54