multimodal dialogue analysis
DESCRIPTION
Multimodal Dialogue Analysis. INOUE, Masashi Yamagata University. 29-Nov-09 @FIU Dr. Tao Li’s Group. Name of the discipline. Computational Social Linguistics Society influences language use Conversation Analysis (CA) Discourse Analysis (DA). Overview (1/5). Layers of investigation. - PowerPoint PPT PresentationTRANSCRIPT
Multimodal Dialogue Analysis
INOUE, Masashi
Yamagata University
29-Nov-09 @FIU Dr. Tao Li’s Group
Name of the discipline
• Computational Social Linguistics– Society influences language use
• Conversation Analysis (CA)• Discourse Analysis (DA)
2
OVERVIEW (1/5)
3
Layers of investigation
Data •Sensing (Objective)•Device development and signal processing
Information •Event detection (Ambiguous)•Pattern recognition
Knowledge •Pattern Discovery (Subjective)•Data mining
4
Major Conferences and Journals
• ICMI-MLMI– ICMI (User Interface) and MLMI (Dialogue
Analysis) merged in 2009
• Some in multimedia or NLP conferences– ACM Multimedia– ACL– etc.
5
Research Initiatives In Europe
• CHIL Corpus• AMI Corpus
– Augmented multi-party interaction– http://corpus.amiproject.org/
• SSPNET– A European network of excellence in social signal
processing– http://sspnet.eu/
6
FIRST EXAMPLE (2/5)
8
Paper 1 (ICMI-MLMI 2009)
• "Discovering group nonverbal conversational patterns with topics” by Dinesh Babu Jayagopi, Daniel Gatica-Pere (IDIAP)
• Goal: Understand group dynamics (= leadership) from conversational video
9
Method
• Feature descriptor– Time slices of conversation (documents)
• different time scale shows different patterns– 1 min scale – monologue vs. 5 min scale - a lot of interaction
– Speaking energy/Speaking status• Bag of non-verbal patterns (NVP)
– speech length, # of turns, successful interruptions• Method (what’s new)
– Unsupervised– Topic model (LDA) – which feature is prominent
10
Feature categories
1. Generic group patterns: group as a whole– silence, one-speaker, two-speaker, other, evenly
2. Leadership patterns: – proposed in social psychology field– position of designated leader (‘L’) or someone else
(‘NL’): taking maximum values• 21 dimensional feature vectors (vocabulary)• 6 tokens per slice (words)
11
12
Data
• AMI Corpus– Meeting for product design– 17 meetings (17 hours)– 4 participants / group:
• ‘Project Manager’, ‘User Interface specialist’, ‘Marketing Expert’, and ‘Industrial Designer’.
13
Result (3 topics)
14
Result (visual)
15Can be used to characterize groups
3 topics
Validation
• Comparison with ground-truth(GT): – 5 min scale, 8 top docs per class– 3 annotator / meeting – GT is majority agreed
• Accuracy: 62%, 100%, 75% for each class– Autocratic, Participative, Free rein
16
Questions
• Feature representation (Are they good? )– Some magic numbers (e.g., 6 words/slice)– Balancing #of vocabulary and # of words
• Modeling technique (Is LDA a valid one?) – Can we regard the NVC as words and Group
Dynamics as topics? – Arbitrary number of topics, different
interpretation
17
EXAMPLE 2 (3/5)
18
Paper 2 (MSSSC 2009)
• "Sensor-Based Organizational Engineering” by Daniel Olguin-Olguin, Alex (Sandy) Pentland (MIT Media Lab)
• [16] Olguin-Olguin, D., & Pentland, A. (2008). Social Sensors for Automatic Data Collection. 14th Americas Conference on Information
• Social signals/Reality mining/Sensible organizations– Introduction to their research projects– Use of sensors to collect data in groups– Combination of textual and survey data– Business communication domain (organizational
behavior)
19
Method
• Sensor data– Face/body/vocal behavior/space and
environment/affective behavior– camera infrared sensors, accelerometer,
gyroscopes, inclinometers, cameras, pressure sensors, microphones, cameras, vibration,...
• Pattern recognition• Social network analysis
– Who talks to who– How well they are communicating
20
Case 1
• Communication in a call center– wearable sensor devices (sociometric badge)– completion time difference (productivity)– 2,200 hours of data (100 hours per employee) and
880 reciprocal e-mails • Findings
– more interaction implied lower productivity– higher variance in physical activity implies lower
productivity
21
Case 2
• Communication in a marketing division– face-to-face vs. emails– questionnaire (satisfaction)
• Findings: – Total comm = email + face-to-face– Total comm negatively correlate with satisfaction
22
Questions
• Evaluation– Some domains do no have clear definition of
good/bad conversation• Interestingness
– High proximity -> low email usage• Implementation
– management practices for productivity improvements, customer satisfaction, and a better competitive position
24
OVERVIEW OF OUR PROJECT (3/5)
25
Pattern discovery from dialogue
• Goal: Finding recurring events or event sequences in human face-to-face dialogues.
• Why?: Human communication skills are often experience or assumption-based. – Enable smooth communication– Prevent problematic communication
• Task: Identify plausible hypotheses by machines that human cannot notice by observation
26
Target dialogue
• Psychotherapeutic Interview (Counseling)– Counseling at schools– Counseling at hospitals
• Increasing demand for therapists• Shortage of qualified teachers• Lack of effective training methods
• Therapist training setting (non-experimental)
27
Our Corpus (Private)
• Psychotherapeutic interview (counseling) – Training opportunity for students
• 25 dialogues (approx. 2 hrs each, 21 hrs in total)
• Adding more dialogues (3/year)
30
Recording and data format
Video Data
Single CameraTwo microphonesAVI -> MPEG
Priority: minimize disturbance for participants
Transcript
Annotation
31
Multimodality
• Verbal cue is dominant in defining meanings (textual information)
• What are the impact of non verbal cues such as gestures, eye-gaze, styles, timing, or context including social background?
32
33
Can gestures indicate misunderstandings?
• “Prediction of Misunderstanding from Gesture Patterns in Psychotherapy”, M. Inoue, R. Hanada, N. Furuyama, NII-2009-001E, Feb. 2009
• Negative result– We should rely on verbal content
34
Gestural Feature for Th & Cl
• Before/During/After the misunderstanding• 5/10/50 sec. windows
• Frequency (x1; x2; x3)• Frequency Difference (x4; x5)• Duration (Mean & Max & Min) (x6; x7; x8)• Mean Interval (x9)
35
Predictability by gestural cues
• Classification by linear discriminate analysis– Is there any feature that have similar
precision/recall tendency over different dialogues?
36
P
R
1 2
3
P
R
1
2
3
Dialogue 1 Dialogue 2
SPEECH-GESTURE INTERACTION (4/5)
37
Analysis of speech type patterns
– Understand how therapists speak words to their clients based on speech type transition patterns
38
1. Closed question e.g., :”Do you mean ~?”2. Open question e.g., “Can you elaborate that?”3. Encouragement/Repeat e.g., “Go on.” “I see.”4. Rephrase e.g., “So, you are thinking ~.”5. Reflection of emotion6. Reflection of meaning7. Other
A taxonomy used in counseling domain
Relationship between speech and gesture
• Frequencies of speech types• At the beginning or the end of dialogues
• How do speech patterns look differently when gestures are taken into consideration?
• Speeches that co-occur with gesturesVS
Speeches without gestures
• Do above division leads to any changes in the speech type transition patterns?
39
Generic encouragement
Sequences beginning from questioning
Co-gesture
Non-Co-Gesture 40
Speech type transition in the beginning of the dialogue
Speech type transition in the ending part of the dialogue
Sequence beginning from encouragement
Sequence beginning from question
Question and rephrase
Co-Gesture
Non Co-Gesture 41
Speech type transition in the ending part of the dialogue(Beginner therapist)
Co-Gesture
Non Co-Gesture
Reflection of therapists’ skill?
42
Summary
• Various speech sequence patterns can be interpreted as the techniques in dialogues.
• Patterns could be better understood when multimodality is taken into account.
• Discovered patterns could be used to assess the proficiency of therapist.
43
VERBAL CONTENT MISMATCH (5/5)
44
Mismatch between intension and perception over an utterance
• Therapists (Th) want to empower clients (Cl) by compliments.
• Clients want to be empowered by Th through their compliments.
• They share the same goal but this process dos not goes well in reality. – Th tried compliment but Cl did not notice it – Some complimentary expression are
uncomfortable to Cls– Th cannot figure out how Cls are praised 45
Compliment as a counseling technique
• Therapists learn the concept and necessity of compliment through lectures, but– There is not enough analysis of failures. – Concrete examples of expression are scarce.
As a result• Inexperienced Th cannot succeed in using
compliment techniques in the actual interview occasions very often.
46
Analysis approach
• How there happen mismatches in terms of vocabulary. – The focus is on what Ths say rather than how they say.
• How the intention and perception are different over the word usage– Timing of the utterance are ignored.
• To understand the generic tendency, multiple dialogues are mixed together into a word pool.
47
Data preparation
• Transcripts based on the videos of psychotherapeutic interviews (13 pairs, 27 participants)
• They are assigned to the participants. • Both Th and Cl highlights Th’s speech where
Th conducted compliment (Th) or Cl was empowered (Cl).
• Highlighted speeches are extracted and put into the word pool.
48
Degree of discrepancy
• Number of highlighted speech by therapists: – 114 (M=8.1)
• Number of highlighted speech by clients:– 69(M=4.6)
• Agreement:– 6%(11/183)
Th marked(114)
Cl marked(69)
Both marked (11)
49
Pre-processing
• Morphological analysis• Replacement of words (fluctuation, removal of
proper nouns for anonymity) • Number of tokens: 4250• Removal of low frequent (tf<2) or single
document (df<2) words focusing on the generic (cross-dialogue) expressions
• Number of vocabulary: 476 -> 113
50
Frequent wordsOverall Therapist Client
Word TF Word TF Word TF
Say 64 Say 22 Say 42Think 45 Thing 19 Very 30Something 42 Think 18 Role 28Role 41 That 18 Think 27Very 40 Something 15 Something 27Thing 39 Role 13 Well 25Well 38 Well 13 Do 22That 36 Do 11 Like this 22Do 33 Great 10 Not 21Like this 31 Not 9 Thing 20
51
0 20 40 60 80 100 1200
10
20
30
40
50
60
70
totalthcl
Eliminate high frequency wordsfrequency
word id
threshold 52
Mid frequency wordsOverall Therapist Client
Word Tf Word Tf Word Tf
Feeling 16 Feeling 7 Now 13Now 16 Say 6 Hmm 11How 13 Talk 6 Story 10Story 13 How 5 Feeling 9Hmm 12 Become 5 Yes 8Listen 12 Thing 5 How 8Yes 10 Tough 5 Listen 8Then 10 Listen 4 Think 8So 10 Absent 4 I 8Think 10 Hard 4 Enter 8
53
Summary• Problem: Compliment used by therapists (Th) during counseling are not
well accepted by clients (Cl).
• Data: 13 dialogue transcripts; utterances where Th intended compliment technique and Cl feel empowered by compliment are marked.
• Analysis: To understand the mismatch in vocabulary level, differences in usage are explored in terms of frequency. – Th tend to use compliment technique to focus on the difficulties of the problem. – Cl may be empowered by the words referring internal mental status.
• Future direction: Understanding resolving process of mismatches taking the difference in proficiency of therapists and dialogue topics into account.
54