[系列活動] emotion-ai: 運用人工智慧實現情緒辨識

Download [系列活動] Emotion-AI: 運用人工智慧實現情緒辨識

If you can't read please download the document

Post on 23-Jan-2018

4.504 views

Category:

Data & Analytics


12 download

TRANSCRIPT

  • Emotion-AI:

    December 16th, 2017

    1

  • Affective Computing (1995):

    the study and development of systems and devices that can recognize, interpret, process, and simulate human affects

    Professor Rosalind PiccardMIT Media Lab

    Annual Conference on AffectComputing and IntelligentInteraction (ACII)

    ACII 2017 @ San Antonio

    2

  • 3

  • ?

    4

  • People also smile when they are miserable. Paul Ekman, Telling Lies: Clues to Deceit the Marketplace, Politics, and Marriage

    5

  • 6

  • PhilosophyDiscuss emotion with philosophy

    Turn to PracticalCombing physical and emotion tostart to apply the systems on human

    Cognitive ProcessCognitive Theory

    Mind-Body DualismCombining physical world with emotion

    Modern Theory1

    2

    3

    4

    5

    7

  • https://aquileana.wordpress.com/2014/04/14/platos-phaedrus-the-allegory-of-the-chariot-and-the-tripartite-nature-of-the-soul/

    Platos horses

    Successful Person- Reason horse is more in control

    described emotion and reason as two horses pulling us in opposite directions.

    PhilosophyDiscuss emotion with philosophy

    8

  • Stoicism Aristippus

    PhilosophyDiscuss emotion with philosophy

    9

  • Mind-Body Dualism

    In the 17th century, Ren Descartes viewed the bodys emotional apparatus as largely hydraulic. He believed that when a person felt angry or sad it was because certain internal valves opened and released such fluids as bile and phlegm

    Mind-Body DualismCombining physical world with emotion

    10

  • Charles Darwin believed that emotions were beneficial for evolution because emotions improved chances of survival. For example, the brain uses emotion to keep us away from a dangerous animal (fear), away from rotting food and fecal matter (disgust), in control of our resources (anger), and in pursuit of a good meal or a good mate (pleasure and lust).

    Damasio, Antonio R. Looking for Spinoza: Joy, Sorrow, and the Feeling Brain. New York NY: Harcourt, Inc., 2003.

    Turn to PracticalDiscuss the combination with physic andemotion and start to apply the systemof human

    11

  • James, William. 1884. "What Is an Emotion?" Mind. 9, no. 34: 188-205.

    Our feeling of the same changes as they occur is the emotion

    Modern Theory

    12

  • James- Lange Cannon-Bard Schachter & Singer

    13

  • James- Lange Cannon-Bard Schachter & Singer

    :1.:

    2.

    3.:

    James

    Two-factor theory of emotion

    :

    :1.(hypothalamus) (limbicsystem)

    14

  • 16

  • 17

  • 321

    18

  • ,

    3

    19

  • 2

    20

  • 1 ()

    Cognitive ProcessCognitive Theory

    21

  • James-Lange

    22

  • ArnoldLazarus- (Appraisal)

    TomkinsIzard-

    23

  • 1

    /

    2

    4

    1

    *

    *

    *

    3

    6

    1

    **

    1

    5

    24

  • 1

    /

    2

    4

    1

    *

    *

    *

    3

    6

    1

    **

    1

    5

    25

  • 1

    /

    2

    4

    1

    *

    *

    *

    3

    6

    1

    **

    1

    5

    INFERENCE?

    26

  • ?

    TAG?

    27

  • Charles Darwin believed that emotions were beneficial for evolution because emotionsimproved chances of survival. For example, the brain uses emotion to keep us away from adangerous animal (fear), away from rotting food and fecal matter (disgust), in control ofour resources (anger), and in pursuit of a good meal or a good mate (pleasure and lust).

    Damasio, Antonio R. Looking for Spinoza: Joy, Sorrow, and the Feeling Brain. New York NY: Harcourt, Inc., 2003.

    28

  • 100

    Paul Ekman

    29

  • 30

  • Are There Universal Facial Expressions? Just guess

    31

  • Facial Action Coding SystemFACS

    AU

    32

  • Mascolo, M. F., Fischer, K. W., & Li,J. (2003). Dynamic development of component system of emotions: Pride, shame, and guilt in China and the United States. In R. J. Davidson, K. R. Scherer, & H. H. Goldsmith (Eds.), Handbook of affective sciences (pp. 375-408). New York: Oxford University Press.Shaver, P. R., Wu, S., & Schwartz, J. C. (1992). Cross-cultural similarities and differences in emotion and its representation: A prototype approach. In Clark, M. S. (Ed.), Review of Personality and Social Psychology, 13, pp. 231-251. Sage: Thousand Oaks.

    33

  • 1

    2

    3

    4

    2

    /

    1

    *

    *

    *

    4

    **

    3

    2

    1 4

    3

    34

  • ?

    35

    label ?

  • there is no limit to the number of possible different emotions

    William James

    36

  • Silvan Tomkins (1962) concluded that there are eight basic emotions:

    surprise, interest, joy, rage, fear, disgust, shame, and anguish

    Carroll Izard (the University of Delaware 1993) 12 discrete emotions labeled: Interest, Joy, Surprise, Sadness, Anger, Disgust, Contempt,

    Self-Hostility, Fear, Shame, Shyness, and Guilt

    Differential Emotions Scale or DES-IV

    37

  • Ekman 1972 ()

    38

  • Dimensional models of emotion

    Define emotions according to one or more dimensions

    Wilhelm Max Wundt(1897) three dimensions: "pleasurable versus unpleasurable",

    "arousing or subduing" and "strain or relaxation

    Harold Schlosberg (1954) three dimensions of emotion: "pleasantness

    unpleasantness", "attentionrejection" and "level of activation

    Prevalent incorporate valence and arousal dimensions

    39

  • Circumplex model Vector model Positive activation negative activation (PANA) model Plutchik's model PAD emotional state model Lvheim cube of emotion Cowen & Keltner 2017

    40

  • Circumplex model : Perceptual

    developed by James Russell (1980) two-dimensional circular space, containing arousal

    and valence dimensions

    arousal represents the vertical axis and valence represents the horizontal axis

    prevalent use as labels

    41

  • Positive activation Negative activation (PANA) Self Report

    created by Watson and Tellegen in 1985 suggests that positive affect and negative affect are

    two separate systems (responsible for differentfunctions)

    states of higher arousal tend to be defined by theirvalence

    states of lower arousal tend to be more neutral interms of valence

    the vertical axis represents low to high positive affect the horizontal axis represents low to high negative

    affect. the dimensions of valence and arousal lay at a 45-

    degree rotation over these axes

    42

  • 43

  • Cowen & Keltner

    2017, University of California, Berkeley researchers Alan S. Cowen & Dacher Keltner (PNAS)

    27 distinct emotions http://news.berkeley.edu/2017/09/06/27-

    emotions/ (A.) Admiration. (B.) Adoration. (C.) Aesthetic appreciation. (D.)

    Amusement. (E.) Anger. (F.) Anxiety. (G.) Awe. (H.) Awkwardness. (I.)Boredom. (J.) Calmness. (K.) Confusion. (L.) Craving. (M.) Disgust. (N.)Empathic pain. (O.) Entrancement. (P.) Excitement. (Q.) Fear. (R.) Horror.(S.) Interest. (T.) Joy. (U.) Nostalgia. (V.) Relief. (W.) Romance. (X.) Sadness(Y.) Satisfaction (Z.) Sexual desire. (.) Surprise.

    44

    http://news.berkeley.edu/2017/09/06/27-emotions/

  • :

    Affective Computing

    Many Theories

    Many Models/Annotations

    Take Away? Stable

    45

  • 1

    /

    2

    4

    1

    *

    *

    *

    3

    6

    1

    **

    1

    5

    (Data-driven AI Learning and Inference) ?

    46

  • ?

    47

  • Affective Computing

    reference: https://www.gartner.com/newsroom/id/3412017/

    fast growing, but still not a mature technique

    48

  • Face

    Affective Computing

    SpeechBody GestureMulti-Modal PhysiologyLanguage

    reference: http://blog.ventureradar.com/2016/09/21/15-leading-affective-computing-companies-you-should-know/49

  • Education Health Care Gaming Advertisement Retail Legal

    Emotion Recognition AS Part of Larger SystemAPI, SDK

    50

  • :

    51

  • Little Dragon(Affectiva- Education)

    make learning more enjoyable and more effective, by providing an educational tool that is both universal and personalized

    reference: https://www.affectiva.com/success-story/

    https://www.youtube.com/watch?v=SmjAa8iMkjU 52

    https://www.affectiva.com/success-story/https://www.youtube.com/watch?v=SmjAa8iMkjU

  • 53

  • Nevermind(Affectiva- Gaming)

    bio-feedback horror game

    sense a players facial expressions for signs of emotional distress, and adapt game play accordingly

    reference: https://www.affectiva.com/success-story/chttps://www.youtube.com/watch?v=NGr0orAqRH4&t=497s 54

    https://www.affectiva.com/success-story/chttps://www.youtube.com/watch?v=NGr0orAqRH4&t=497s

  • Brain Power(Affectiva- Health Care)

    The Worlds First Augmented Reality Smart-Glass-Systemto empower children and adults with autism to teachthemselves crucial social and cognitive skills.

    reference: https://www.affectiva.com/success-story/https://www.youtube.com/watch?v=qfoTprgWyns

    55

    https://www.affectiva.com/success-story/https://www.youtube.com/watch?v=qfoTprgWyns

  • 56

  • MediaRebel(Affectiva- Legal)

    Legal video deposition management platform MediaRebel uses AffectivasEmotion SDK for facial expression analysis and emotion recognition.

    Intelligent analytical features include: Search transcript based upon witness emotions Instantly playback testimony based upon select emotions Identify positive, negative & neutral witness behavior

    reference: https://www.affectiva.com/success-story/

    https://www.mediarebel.com/57

    https://www.affectiva.com/success-story/https://www.mediarebel.com/

  • shelfPoint(Affectiva- Retail)

    Cloverleaf is a retail technology company for the modern brick-and-mortar marketer and merchandise

    shelfPoint solution: brands and retailers can now capture customer engagement and sentiment data at the moment of purchase decision something previously unavailable in physical retail stores.

    reference: https://www.affectiva.com/success-story/https://www.youtube.com/watch?v=S9gDqpF6kLs

    https://www.youtube.com/watch?v=W6UnahO_zXs58

    https://www.affectiva.com/success-story/https://www.youtube.com/watch?v=S9gDqpF6kLshttps://www.youtube.com/watch?v=W6UnahO_zXs

  • 59

  • :

    60

  • 1

    /

    2

    4

    1

    *

    *

    *

    3

    6

    1

    **

    1

    5

    Data-driven AI Learning and Inference ?

    61

  • ?

    ?

    62

  • 63

    Year Database Language Setting Protocol Elicitation

    1997 DES Dan. Single Scr. Induced

    2000 GEMEP Fre. Single Scr. & Spo. Acted

    2005 eNTERFACE' 05 Eng. Single Scr. Induced2007 HUMAINE Eng. TV Talk Scr. & Spo. Mix.

    2008 VAM Ger. TV Talk Spo. Acted

    2008 IEMOCAP Eng. Dyadic Scr. & Spo. Acted

    2009 SAVEE Eng. Single Spo. Acted

    2010 CIT Eng. Dyadic Scr. & Spo. Acted

    2010 SEMAINE Eng. Dyadic Scr. Mix.

    2013 RECOLA Fre. Dyadic Spo. Acted

    2016 CHEAVD Chi. TV talk Spo. Posed

    2017 NNIME Chi. Dyadic Spo. Acted

  • Language: DanishParticipants: 4 (Man: 2; Female: 2)Recordings:

    AudioTotal: 0.5 hoursSentences: 5200 utterancesLabels:

    Perspectives: Nave-Observer Rater: 20 Discrete session-level annotation

    Categorical (5)

    DES:DESIGN, RECORDING AND VERIFICATION OF A DANISH

    EMOTIONAL SPEECH DATABASE

    64

    Engberg, Inger S., et al. "Design, recording and verification of a Danish emotional speech database." Fifth European Conference on Speech Communication and Technology. 1997.

    Available: Tom Brndsted ([email protected])

  • DES

    Loss-Scaled Large-Margin Gaussian Mixture Models for Speech Emotion Classification1

    (Cat.:0.676)

    Automatic emotional speech classification2

    (Cat.:0.516)

    65

    1Yun, Sungrack, and Chang D. Yoo. "Loss-scaled large-margin Gaussian mixture models for speech emotion classification."IEEE Transactions on Audio, Speech, and Language Processing20.2 (2012): 585-598.

    2Ververidis, Dimitrios, Constantine Kotropoulos, and Ioannis Pitas. "Automatic emotional speech classification." Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP'04). IEEE International Conference on. Vol. 1. IEEE, 2004..

  • Language: FrenchParticipants: 10 (Man: 5; Female: 5)Recordings:

    Dual-channel Audio HD Video Manual Transcript Face & Head Body Posture & Gestures

    Sentences: 7300 sequencesLabels:

    Perspectives: Nave-Observer Discrete session-level annotation Categorical (18)

    GEMEP:Geneva Multimodal Emotion Portrayals corpus

    66

    Bnziger, Tanja, Hannes Pirker, and K. Scherer. "GEMEP-GEneva Multimodal Emotion Portrayals: A corpus for the study of multimodal emotional expressions." Proceedings of LREC. Vol. 6. 2006.

    Bnziger, Tanja, and Klaus R. Scherer. "Using actor portrayals to systematically study multimodal emotion expression: The GEMEP corpus." International conference on affective computing and intelligent interaction. Springer, Berlin, Heidelberg, 2007.

    Available: Tanja Bnziger(Tanja.Banziger@ pse.unige.ch)

  • GEMEP

    Multimodal emotion recognition from expressive faces, body gestures and speech

    (Cat.: 0.571)

    67

    Kessous, Loic, Ginevra Castellano, and George Caridakis. "Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis." Journal on Multimodal User Interfaces 3.1 (2010): 33-48.

  • Language: EnglishParticipants: 42 (Man: 34; Female: 24)(14 different nationalities)Recordings:

    Dual-channel Audio HD Video Script

    Total: 1166 video sequencesEmotion-related atmosphere:

    To express six emotions

    eNTERFACE' 05:The eNTERFACE05 Audio-Visual Emotion Database

    68

    Martin, Olivier, et al. "The enterface05 audio-visual emotion database." Data Engineering Workshops, 2006. Proceedings. 22nd International Conference on. IEEE, 2006.

    Available: O. Martin ([email protected])

  • eNTERFACE' 05 Sparse autoencoder-

    based feature transfer learning for speech emotion recognition1

    (Cat.: 59.1)

    Unsupervised learning in cross-corpus acoustic emotion recognition2

    (Val./Act.:0.574/0.616)

    69

    1Deng, Jun, et al. "Sparse autoencoder-based feature transfer learning for speech emotion recognition." Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on. IEEE, 2013.

    2Zhang, Zixing, et al. "Unsupervised learning in cross-corpus acoustic emotion recognition." Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on. IEEE, 2011.

  • Language: EnglishParticipants: Many (Include 8 datasets)Recordings : (Naturalistic (TV shows, interviews)/Induced data)

    Audio Video Gesture Emotion words

    Labels: Perspectives: Nave-Observer Rater: 4 Continuous-in-time annotation

    Dimensional (8) [Intensity, Activation, Valence, Power, Expect, Word] Discrete annotation (5)

    Emotion-related states Key Event Everyday Emotion words

    HUMAINE:Addressing the Collection and Annotation of

    Naturalistic and Induced Emotional Data

    70

    Douglas-Cowie, Ellen, et al. "The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data." Affective computing and intelligent interaction (2007): 488-500.

    Available: [email protected]

  • HUMAINE

    A Multimodal Database for Affect Recognition and Implicit Tagging1

    (Val./Act.:0.761/0.677)

    Abandoning Emotion Classes - Towards Continuous Emotion Recognition with Modelling of Long-Range Dependencies2

    (Val./Act.[MSE]:0.18/0.08)

    71

    1Soleymani, Mohammad, et al. "A multimodal database for affect recognition and implicit tagging." IEEE Transactions on Affective Computing 3.1 (2012): 42-55.

    2Wllmer, Martin, et al. "Abandoning emotion classes-towards continuous emotion recognition with modelling of long-range dependencies." Ninth Annual Conference of the International Speech Communication Association. 2008.

  • Language: German TV showsParticipants: 47 Recordings:

    Audio Video Face Manual Transcript

    Total: 12 hoursSentences: 946 utterancesLabels:

    Perspectives: Peer, Director, Self, Nave-Observer Rater: 17 Continuous-in-time annotation

    Dimensional (Valence-Activation-Dominance) for Audio Discrete session-level annotation

    Categorical (7) for Faces

    VAM:The Vera am Mittag German Audio-Visual

    Spontaneous Speech Database

    72

    Grimm, Michael, Kristian Kroschel, and Shrikanth Narayanan. "The Vera am Mittag German audio-visual emotional speech database." Multimedia and Expo, 2008 IEEE International Conference on. IEEE, 2008.

    Available: [email protected]

  • VAM

    Towards robust spontaneous speech recognition with emotional speech adapted acoustic models1

    (Word ACC.: 42.75)

    Selecting training data for cross-corpus speech emotion recognition: Prototypicality vs. generalization Speech Adapted Acoustic Models2

    (Val./Act.: 0.502/0.677)

    73

    1Vlasenko, Bogdan, Dmytro Prylipko, and Andreas Wendemuth. "Towards robust spontaneous speech recognition with emotional speech adapted acoustic models." Poster and Demo Track of the 35th German Conference on Artificial Intelligence, KI-2012, Saarbrucken, Germany. 2012.

    2Schuller, Bjrn, et al. "Selecting training data for cross-corpus speech emotion recognition: Prototypicality vs. generalization." Proc. 2011 Afeka-AVIOS Speech Processing Conference, Tel Aviv, Israel. 2011.

  • Language: EnglishParticipants: 10 (Man: 5; Female: 5)Recordings:

    Dual-channel Audio HD Video Manual Transcript 53 Marker Motion (Face and Head)

    Total: 12 hours, 50 sessions (3 min/session)Sentence: 6904 sentencesLabels:

    Perspectives: Nave-ObserverSelf (6/10) Rater: 6 Continuous-in-time annotation

    Dimensional (Valence-Activation-Dominance) Discrete session-level annotation

    Categorical (5)

    IEMOCAP:The Interactive Emotional Dyadic Motion Capture

    database

    74

    Busso, Carlos, et al. "IEMOCAP: Interactive emotional dyadic motion capture database." Language resources and evaluation 42.4 (2008): 335.

    Available: Anil Ramakrishna ([email protected])

  • IEMOCAP Tracking continuous

    emotional trends of participants during affective dyadicinteractions using body language and speech information1

    (Val./Act./Dom.:0.619/0.637/0.62)

    Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions2

    (Cat./Val./Act.:0.552/0.634/0.650)

    75

    1Metallinou, Angeliki, Athanasios Katsamanis, and Shrikanth Narayanan. "Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information." Image and Vision Computing 31.2 (2013): 137-152.

    2Lee, Chi-Chun, et al. "Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions." Tenth Annual Conference of the International Speech Communication Association. 2009.

  • Language: EnglishParticipants: 4 (Man: 4)Recordings:

    Dual-channel Audio Video Face Maker

    Sentences: 480 utterancesLabels:

    Perspectives: Nave-Observer Discrete session-level annotation

    Categorical (6)

    SAVEE:Surrey Audio-Visual Expressed Emotion database

    76

    Jackson, P., and S. Haq. "Surrey Audio-Visual Expressed Emotion(SAVEE) Database." University of Surrey: Guildford, UK (2014).

    Available: P Jackson ([email protected])

  • SAVEE

    77

    . 2S. Haq and P.J.B. Jackson. "Speaker-Dependent Audio-Visual Emotion Recognition", In Proc. Int'l Conf. on Auditory-Visual Speech Processing, pages

    53-58, 2009.3S. Haq, P.J.B. Jackson, and J.D. Edge. Audio-Visual Feature Selection and Reduction for Emotion Classification. In Proc. Int'l Conf. on Auditory-Visual

    Speech Processing, pages 185-190, 2008

    Speaker-Dependent Audio-Visual Emotion Recognition1

    (Cat.: 97.5) Audio-Visual Feature

    Selection and Reduction for Emotion Classification3

    (Cat.: 96.7)

  • Language: EnglishParticipants: 16 (Man: 7; Female: 9)Recordings:

    Dual-channel Audio HD Video Transcript Body gesture

    Total: 48 dyadic sessions Sentences: 2162 sentenceLabels:

    Perspectives: Nave-Observer Rater: 3 Discrete session-level annotation Continuous-in-time annotation

    Dimensional (Valence-Activation-Dominance)

    CIT:The USC CreativeIT database of multimodal dyadic

    interactions: from speech and full body motion captureto continuous emotional annotations

    78

    Metallinou, Angeliki, et al. "The USC CreativeIT database: A multimodal database of theatrical improvisation." Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality (2010): 55.

    Metallinou, Angeliki, et al. "The USC CreativeIT database of multimodal dyadic interactions: From speech and full body motion capture to continuous emotional annotations." Language resources and evaluation 50.3 (2016): 497-521.

    Available: Manoj Kumar ([email protected])

  • CIT

    79

    1Yang, Zhaojun, and Shrikanth S. Narayanan. "Modeling dynamics of expressive body gestures in dyadic interactions." IEEE Transactions on Affective Computing 8.3 (2017): 369-381.

    2Yang, Zhaojun, and Shrikanth S. Narayanan. "Analyzing Temporal Dynamics of Dyadic Synchrony in Affective Interactions." INTERSPEECH. 2016.3Chang, Chun-Min, and Chi-Chun Lee. "Fusion of multiple emotion perspectives: Improving affect recognition through integrating cross-lingual emotion

    information." Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017.

    Analyzing Temporal Dynamics of Dyadic Synchrony in Affective Interactions2

  • Language: EnglishParticipants: 150Recordings:

    Dual-channel Audio HD Video Manual Transcript

    Multi-Interaction (like TV talk show): Human vs. Human Semi-human vs. Human Machine vs. Human

    Total: 959 dyadic sessions (3 min/session)Labels:

    Perspectives: Nave-Observer Rater: 8 Continuous-in-time annotation

    Dimensional (Valence-Activation) Discrete Categorical (27)

    SEMAINE:The SEMAINE Database: Annotated Multimodal Records of Emotionally

    Colored Conversations between a Person and a Limited Agent

    80

    McKeown, Gary, et al. "The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent." IEEE Transactions on Affective Computing 3.1 (2012): 5-17.

    Available: [email protected]

  • SEMAINE Building autonomous

    sensitive artificial listeners1

    A Dynamic Appearance Descriptor Approach to Facial Actions Temporal Modeling2

    (0.701)

    81

    1Schroder, Marc, et al. "Building autonomous sensitive artificial listeners." IEEE Transactions on Affective Computing 3.2 (2012): 165-183.

    2Jiang, Bihan, et al. "A dynamic appearance descriptor approach to facial actions temporal modeling." IEEE transactions on cybernetics 44.2 (2014): 161-174.

  • Language: FrenchParticipants: 46 (Man: 19; Female: 27)Recordings:

    Dual-channel Audio HD Video (15 facial action units) Electrocardiogram Electrothermal activity

    Total: 11 hours, 102 dyadic sessions (3 min/session)Sentence: 1306 sentenceLabels:

    Perspectives: Self, Nave-Observer Rater: 6 Continuous-in-time annotation

    Dimensional (Valence-Activation)

    RECOLA:Remote Collaborative and Affective Interactions

    82

    Ringeval, Fabien, et al. "Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions." Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on. IEEE, 2013.

    Available: Fabien Ringeval ([email protected])

  • RECOLA Prediction of asynchronous

    dimensional emotion ratings from audiovisual and physiological data1

    (Val./Act.: 0.804/0.528 )

    End-to-end speech emotion recognition using a deep convolutional recurrent network2

    (Val./Act.: 0.741/0.325 )

    Face Reading from SpeechPredicting Facial Action Units from Audio Cues3

    (Predict Facial Action Units from Audio Cues: 0.650 )

    83

    1Ringeval, Fabien, et al. "Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data." Pattern Recognition Letters 66 (2015): 22-30.2Trigeorgis, George, et al. "Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network." Acoustics, Speech and Signal

    Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016.3Ringeval, Fabien, et al. "Face Reading from SpeechPredicting Facial Action Units from Audio Cues." Sixteenth Annual Conference of the International Speech

    Communication Association. 2015.

  • Language: ChineseParticipants: 238 Recordings:

    Audio Video (34 films, 2 TV series, 4 TV shows)

    Total: 2.3 hours,Labels:

    Rater: 4 Discrete session-level annotation

    Fake/suppressed emotions Multi-emotion annotation for some segments Categorical (26 non-prototypical)

    2017 Multimodal Emotion Recognition Challenge (MEC 2017: http://www.chineseldc.org/htdocsEn/emotion.html)

    CHEAVD: A Chinese natural emotional audio-visual database

    84

    Li, Ya, et al. "CHEAVD: a Chinese natural emotional audiovisual database." Journal of Ambient Intelligence and Humanized Computing 8.6 (2017): 913-924.

    Available: Ya Li ([email protected])

  • CHEAVD MEC 2016: the multimodal emotion recognition

    challenge of CCPR 20161 (Cat.: 37.03)

    Chinese Speech Emotion Recognition2 (Cat.: 47.33) Transfer Learning of Deep Neural Network for

    Speech Emotion Recognition3 (Cat.: 50.01)

    85

    1Li, Ya, et al. "MEC 2016: the multimodal emotion recognition challenge of CCPR 2016." Chinese Conference on Pattern Recognition. Springer Singapore, 2016.

    2Zhang, Shiqing, et al. "Feature Learning via Deep Belief Network for Chinese Speech Emotion Recognition." Chinese Conference on Pattern Recognition. Springer Singapore, 2016.

    3Huang, Ying, et al. "Transfer Learning of Deep Neural Network for Speech Emotion Recognition." Chinese Conference on Pattern Recognition. Springer Singapore, 2016.

  • Language: ChineseParticipants: 44 (Man: 20; Female: 24)Recordings:

    Dual-channel Audio HD Video Manual Transcript Electrocardiogram

    Total: 11 hours, 102 dyadic sessions (3 min/session)Sentences: 6029 utterancesLabels:

    Perspectives: Peer, Director, Self, Nave-Observer Rater: 49 Continuous-in-time annotation Discrete session-level annotation

    Dimensional (Valence-Activation) Categorical (6)

    NNIME:The NTHU-NTUA Chinese Interactive Multimodal

    Emotion Corpus

    86

    Huang-Cheng Chou, Wei-Cheng Lin, Lien-Chiang Chang, Chyi-Chang Li, Hsi-Pin Ma, Chi-Chun Lee "NNIME: The NTHU-NTUA Chinese Interactive Multimodal Emotion Corpus" in Proceedings of ACII 2017

    Available: Huang-Cheng Chou ([email protected])Chi-Chun Lee ([email protected])

  • NNIME

    Cross-Lingual Emotion Information1,3(session)

    (Val./Act.: 0.682/0.604)

    Dyad-Level Interaction2

    (Cat.: 0.65)

    87

    1Chun-Min Chang, Bo-Hao Su, Shih-Chen Lin, Jeng-Lin Li, Chi-Chun Lee*, "A Boostrapped Multi-View Weighted Kernel Fusion Framework for Cross-Corpus Integration of Multimodal Emotion Recognition" in Proceedings of ACII 2017

    2Yun-Shao Lin, Chi-Chun Lee*, "Deriving Dyad-Level Interaction Representation using Interlocutors Structural and Expressive Multimodal Behavior Features" in Proceedings of the International Speech Communication Association (Interspeech), pp. 2366-2370, 2017

    3Chun-Min Chang, Chi-Chun Lee*, "Fusion of Multiple Emotion Perspectives: Improving Affect Recognition Through Integrating Cross-Lingual Emotion Information" in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.5820-5824, 2017

  • Access These Emotion Database

    88

    Year Database Website

    1997 DES http://kom.aau.dk/~tb/speech/Emotions/

    2000 GEMEP https://www.affective-sciences.org/gemep/

    2005 eNTERFACE' 05 http://www.enterface.net/enterface05/2007 HUMAINE http://emotion-research.net/download/pilot-db/

    2008 VAM http://emotion-research.net/download/vam

    2008 IEMOCAP http://sail.usc.edu/iemocap/

    2009 SAVEE http://kahlan.eps.surrey.ac.uk/savee/

    2010 CIT http://sail.usc.edu/CreativeIT/ImprovRelease.htm

    2010 SEMAINE https://semaine-db.eu/

    2013 RECOLA https://diuf.unifr.ch/diva/recola/download.html

    2016 CHEAVD Upon request

    2017 NNIME http://nnime.ee.nthu.edu.tw/

  • Key take-away

    ()

    ()

    ()

    : ?

    89

  • 1

    /

    2

    4

    1

    *

    *

    *

    3

    6

    1

    **

    1

    5

    Data-driven AI Learning and Inference ?

    90

  • 1

    /

    2

    4

    1

    *

    *

    *

    3

    6

    1

    **

    1

    5

    Data-driven AI Learning and Inference ?

    91

  • Speech

    Text

    Gesture

    Face

    Human Expression

    92

  • Paralinguistic Expression

    Linguistic Expression

    93

  • Achievement 88.4%

    Amusement 90.4%

    Contentment 52.4%

    Pleasure 61.6%

    Relief 83.9%

    73%-94%

    Amusementdisgust

    Pleasuresadness

    Sadnessrelief24%17.5%

    Amusement12%

    7.9%

    ? ?

    Achievement, Amusement, Anger,

    Contentment, Disgust, Pleasure, Relief,

    Sadness, Surprise

    : 69.9%

    Sauter, Disa. An investigation into vocal expressions of emotions: the roles of valence, culture, and acoustic factors. University of London, University College London (United Kingdom), 2007.

    94

  • Laugh

    Cry

    Sigh

    Whisper

    Whine

    Laukka, Petri, et al. "Cross-cultural decoding of positive and negative non-linguistic emotion vocalizations." Frontiers in Psychology 4 (2013).Gupta, Rahul, et al. "Detecting paralinguistic events in audio stream using context in features and probabilistic decisions." Computer Speech & Language 36 (2016): 72-92.

    Laughter & Fillers2015

    IS2013 sub-challenge

    AUC for Detection

    Laughter : 95.3 %

    Fillers : 90.4 %

    Cross-Culture2013

    Universal Emotion

    Non-Verbal Signals

    Speak : India, USA,

    Kenya, Singapore,

    Listen : Sweden

    95

  • ?

    Sahu, Saurabh & Gupta, Rahul & Sivaraman, Ganesh & AbdAlmageed, Wael & Espy-Wilson, Carol. (2017). Adversarial Auto-Encoders for Speech Based Emotion Recognition. 1243-1247. 10.21437/Interspeech.2017-1421. Rao, K. Sreenivasa, Shashidhar G. Koolagudi, and Ramu Reddy Vempada. "Emotion recognition from speech using global and local prosodic features." International journal of speech technology 16.2 (2013): 143-160.Lalitha, S., et al. "Emotion detection using MFCC and Cepstrum features." Procedia Computer Science 70 (2015): 29-35.Huang, Che-Wei, and Shrikanth Narayanan. "Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition." arXiv preprint arXiv:1706.02901 (2017).Lee, Jinkyu, and Ivan Tashev. "High-level feature representation using recurrent neural network for speech emotion recognition." INTERSPEECH. 2015.

    Emo-DB

    ProsodicSVM

    62.43%

    MFCCANN

    85.7%

    Deep Convolution

    High-Level Representation

    (time series)

    96

  • Dimosa, Kostis, Leopold Dickb, and Volker Dellwoc. "Perception of levels of emotion in speech prosody." The Scottish Consortium for ICPhS (2015).Erickson, Donna. "Expressive speech: Production, perception and application to speech synthesis." Acoustical Science and Technology 26.4 (2005): 317-325.Sauter, Disa. An investigation into vocal expressions of emotions: the roles of valence, culture, and acoustic factors. University of London, University College London (United Kingdom), 2007.Erickson, Donna. "Expressive speech: Production, perception and application to speech synthesis." Acoustical Science and Technology 26.4 (2005): 317-325.

    emotional prosody does not function categorically, distinguishing only different emotions, but also indicates different degrees of the expressed emotion.

    pitch and pitch variation is especially important for people to recognize emotion from non-verbal sounds

    voice quality tension

    Some Experiments : change the sound (remove pitch, noisy channel, )

    (descriptors)

    A Review : Research Findings of Acoustic and Perceptual Studies

    97

  • Flow chart

    Learning Representation Discriminative Model

    98

  • (Low-level Descriptors)Low Level Descriptors (10 15 ms)

    Mel Frequency Cepstral CoefficientsPitchSignal EnergyLoudnessVoice Quality (Jitter, Shimmer)Log Filterbank EnergiesLinear Prediction Cepstral CoefficientsCHROMA and CENS Features (Music)

    Compute

    Statistics Method

    Continuous Qualitative Spectral

    PitchEnergy

    Formants

    Voice quality :Harsh, tense,

    breathy

    LPCMFCCLFPC

    99

  • :

    Arias, Juan Pablo, Carlos Busso, and Nestor Becerra Yoma. "Shape-based modeling of the fundamental frequency contour for emotion detection in speech." Computer Speech & Language28.1 (2014): 278-294.

    emotionally salient temporal segments

    75.8% in binary emotion classification Dot, dash : subjective, dev. of sujective Solid : objective

    100

  • Source Filter

    (ex) High arousal

    PhysicallyVocal Production System Respiration Vocal Fold Vibration Articulation

    increase tension in laryngeal musculature

    raised subglottis pressure

    change production of sound at glottis

    vocal quality

    Johnstone, Tom & Scherer, Klaus. (2000). Vocal communication of Emotion. Handbook of Emotions,. . 101

  • Mel-scale Filter Bank

    The response of the basilar membrane as a function of frequency, measured at six different distances from the stapes

    The psychoacoustical transfer function

    Stern, Richard M., and Nelson Morgan. "Features based on auditory physiology and perception." Techniques for Noise Robustness in Automatic Speech Recognition (2012): 193227.102

  • Support Vector Machine (SVM)

    Convolutional Neural Network

    Hidden Markov Model (HMM)

    Recurrent Neural Network

    Time series Model

    103

  • End to End From LLD to Deep Learning

    Z. Aldeneh and E. M. Provost, "Using regional saliency for speech emotion recognition," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, 2017, pp. 2741-2745. doi: 10.1109/ICASSP.2017.7952655C. W. Huang and S. S. Narayanan, "Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition," 2017 IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, 2017, pp. 583-588. doi: 10.1109/ICME.2017.8019296

    signal Neural Network emotion

    CNN for Time Series SignalAttention

    104

  • YAAFE, an Easy to Use and Efficient Audio Feature Extraction Software, B.Mathieu, S.Essid, T.Fillon, J.Prado, G.Richard, proceedings of the 11th ISMIR conference, Utrecht, Netherlands, 2010.Florian Eyben, Felix Weninger, Florian Gross, Bjrn Schuller: Recent Developments in openSMILE, the Munich Open-Source Multimedia Feature Extractor, In Proc. ACM Multimedia (MM), Barcelona, Spain, ACM, ISBN 978-1-4503-2404-5, pp. 835-838, October 2013. doi:10.1145/2502081.2502224Paul Boersma & David Weenink (2013): Praat: doing phonetics by computer [Computer program].

    105

  • Paralinguistic Expression

    Linguistic Expression

    106

  • ?

    Schwarz-Friesel, Monika. "Language and emotion." The Cognitive Linguistic Perspective, in: Ulrike Ldtke (Hg.), Emotion in Language. TheoryResearchApplication, Amsterdam (2015): 157-173.

    LexiconGrammarIdeational Meaning

    Lindquist, Kristen A., Jennifer K. MacCormack, and Holly Shablack. "The role of language in emotion: predictions from psychological constructionism." Frontiers in psychology 6 (2015).

    developmental and cognitive science, demonstrating that language helps

    107

  • Human Behavior Evaluation

    Cuples Therapy

    Oral Presentation

    Reviews

    Hotels HBRNN

    Amazon Cross-Lingual

    Movie (93%), Book (92%), DVD (93%), PNN + RBM

    Tweets

    Positive & Negative

    DCNN & LSTM

    Ain, Qurat Tul, et al. "Sentiment analysis using deep learning techniques: a review." Int J Adv Comput Sci Appl 8.6 (2017): 424108

  • Review Article Social Media Talk

    , ?

    Its terrible!

    What Texts Tell Us (Topics) Emotional Polarity

    Its cool!

    Parts of Speech (POS) tagsN-Gram

    https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

    VB

    VBD

    NN

    NNS

    JJJJR

    JJS

    IN

    TO

    109

  • Dictionary-Based Sentiment Analysis

    110

  • Changqin Quan and Fuji Ren. 2009. Construction of a blog emotion corpus for Chinese emotional expression analysis. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3 (EMNLP '09), Vol. 3. Association for Computational Linguistics, Stroudsburg, PA, USA, 1446-1454.Mohammad, Saif M., and Peter D. Turney. "Crowdsourcing a wordemotion association lexicon." Computational Intelligence29.3 (2013): 436-465.Pennebaker, James W., et al. The development and psychometric properties of LIWC2015. 2015.

    :

    LIWC (Linguistic Inquiry Word Count) : ()644500/ : 406/499

    Seed WordGold Standard :

    1.

    2.

    111

  • Data-driven?Sentiment Analysis (Unsupervised)Data-DrivenLatent Structure (representation) recognition

    112

  • Sentiment Analysis (Supervised)

    = 0.76 0.6 0.79 = 0.66 = 0.73 (Nave Bayes, SVM)

    Aman, Saima, and Stan Szpakowicz. "Identifying expressions of emotion in text." Text, speech and dialogue. Springer Berlin/Heidelberg, 2007.

    Feature RepresentationClassifier

    Emotion Label

    113

  • Deep Model

    Lopez, Marc Moreno, and Jugal Kalita. "Deep Learning applied to NLP." arXiv preprint arXiv:1703.03091 (2017)..

    Embed Embed Embed

    I

    LSTM LSTM LSTM

    love it

    positive

    114

  • ?

    115

  • Automatic Speech Recognition (ASR)

    f( ) = Speech Text

    Challenging Task

    speaker gender

    Mapping / Translation

    116

    phoneme

  • Aldeneh, Zakaria & Khorram, Soheil & Dimitriadis, Dimitrios & Mower Provost, Emily. (2017). Pooling acoustic and lexical features for the prediction of valence. 68-72. 10.1145/3136755.3136760.

    AffectNatural Language

    Non-Verbal

    SpeechBio-

    Information

    Image

    Pooling Intermediate Representation

    Performance

    Robustness

    117

  • Facial Action Coding SystemFACS

    AU

    118

  • Facial Action Coding System (FACS)

  • FACS

    The tool for annotating facial expressions

    What The Face Reveals is strong evidence for the fruitfulness of the systematic analysis of facial expression

    Paul Ekman and Wallace V. Friesen 1976

  • Action Unit (AUs)

    AUs are considered to be the smallest visuallydiscernible facial movement

    As AUs are independent of any interpretation,they can be used as the basis for recognition ofbasic emotions

    Its an explicit means of describing all possiblemovements of face in 46 action points

  • Action Unit (AUs) FACS is a tool for measuring facial expressions Each observable component of facial moment is called

    an AUs

    All facial expressions can be broken down into their constituent AUs

    AU Description Example AU Description Example

    1 Inner Brow Raiser 12 Lip Corner Puller

    4 Brow Lowerer 13 Cheek Puffer

    7 Lid Tightener 20 Lip stretcher

  • AU framework

  • Facial Expressions of Emotion(e.g., happy, fear, disgust, surprise, etc)

    Automatic face & Facial feature detection

    Face alignment

    Multiple image windows at a variety

    of Locations and scales

    Feature extraction:Facilitate subsequent learning and generalization, leading to better human interpretation

    Image filter:Modify or enhance the image

    Facial AU(e.g., AU1, AU7,

    AU6+ASU15, etc)

    Rule-basedclassifier

    e.g., Gabor filter coefficients

  • Facial Expressions of Emotion(e.g., happy, fear, disgust, surprise, etc)

    Automatic face & Facial feature detection

    Face alignment

    Multiple image windows at a variety

    of Locations and scales

    Feature extraction:Facilitate subsequent learning and generalization, leading to better human interpretation

    Image filter:Modify or enhance the image

    Facial AU(e.g., AU1, AU7,

    AU6+ASU15, etc)

    Rule-basedclassifier

    "Recognizing action units for facial expression analysis.Tian, Y-I., Takeo Kanade, and Jeffrey F. Cohn.

  • Recognize AUs for Facial Expression Analysis- Rule-based Classifier

    Informed by FACS AUs, they group the facial features into upper and lower parts because the facial actions in two sides are relatively independent for AU recognition [14]P. Ekman and W.V. Friesen, The facial action coding system: A technique for the measurement of facial movement

    single AU detection

    combined AU detection

  • Recognize AUs for Facial Expression Analysis- Results

    AU detectionEkman-Hager

    Single AUdetection

    Combine AUdetection

    Recognition rate

    Upperface 75 % 86.7 %

    Lowerface 95.8 % 90.7 %

    AU detectioncross database

    Test databasesTrain

    databaseCohn-Kanade

    Ekman-Hager

    Recognition rate

    Upperface 93.2 % 86.7 %

    Ekman-Hager

    Lowerface 90.7 % 93.4 %

    Cohn-Kanade

    :

    AU,

  • Facial Expressions of Emotion(e.g., happy, fear, disgust, surprise, etc)

    Automatic face & Facial feature detection

    Face alignment

    Multiple image windows at a variety

    of Locations and scales

    Feature extraction:Facilitate subsequent learning and generalization, leading to better human interpretation

    Image filter:Modify or enhance the image

    Facial AU(e.g., AU1, AU7,

    AU6+ASU15, etc)

    Rule-basedclassifier

    "Recognizing Facial Expressions of Emotion using Action Unit Specific Decision Thresholds Mustafa Sert, and Nukhet Aksoy

    AAM face track model

  • Recognizing Facial Expressions of Emotion using Action Unit Specific Decision Thresholds

    Extract facial images from Active Appearance Model (AAM) to form an appearance model

    Facial AU multi-class classification using ADT for both AU detection and facial expression recognition

    ADT learns a separate decision threshold for each AU category, assign instance to category if and only if satisfied:

    = + >

    is the mapping function to map SVM to high dimension space

  • Recognizing Facial Expressions of Emotion using Action Unit Specific Decision Thresholds

    : Prototypic and major variants of AU combinations for facial expression fear. + denotes logical AND , indicates logical OR

    Facial expression recognition accuracy of the proposed scheme. Bold bracketed numbers indicate best result, bold numbers denote second best

  • Recognizing Facial Expressions of Emotion using Action Unit Specific Decision Thresholds

    ADT-based AU detector along with the rule-based emotion classifier (B&D) outperforms the baseline methods (A&C)

    Among the proposed method, D gives best results in all facial emotion categories except surprise

    The proposed ADT scheme outperforms the baseline method by an average F1-score of 6.383% for 17 AUs

    It gives superior performance in terms of F1-score compared with the baseline method for all AUs except AU2

  • Facial Expressions of Emotion(e.g., happy, fear, disgust, surprise, etc)

    Automatic face & Facial feature detection

    Face alignment

    Multiple image windows at a variety

    of Locations and scales

    Feature extraction:Facilitate subsequent learning and generalization, leading to better human interpretation

    Image filter:Modify or enhance the image

    Facial AU(e.g., AU1, AU7,

    AU6+ASU15, etc)

    Rule-basedclassifier

    "Compound facial expressions of emotion: from basic research to clinical applicationsShichuan Du, and Aleix M. Martinez

    Observations under distinct compound emotions

  • Compound facial expressions of emotion

  • Compound facial expressions of emotion

    AU intensity shown in a cumulative histogram for each AU and emotion category

    The x-axis in these histograms specifies the intensity of activation The y-axis in these histograms defines the cumulative percentage of intensity

    (scale 0 to 1)

    Numbers between zero and one specify the percentage of people using the specified and smaller intensities.

    Fig. AUs used to express a compound emotion are consistent with the AUs used to express its component categories

  • Key take-away

    AU

    AU ! ! !

    AUDNN

    ++

    135

  • 136

    AffectNatural Language

    Non-Verbal

    Speech Physiology

    Face

    Body Gestures

    ?

  • 1234567

    reference: de Meijer, M. The contribution of general features of body movement to the attribution of emotions. Journal of Nonverbal Behavior 13, 4 (1989), 247268. 137

  • Psychology Bull, P. E. Posture and gesture. Pergamon press, 1987. Pollick, F. E., Paterson, H. M., Bruderlin, A., and Sanford, A. J. Perceiving affect from arm

    movement. Cognition 82, 2 (2001), B51B61.

    Coulson, M. Attributing emotion to static body postures: Recognition accuracy, confusions, and viewpoint dependence. Journal of nonverbal behavior 28, 2 (2004), 117139.

    Boone, R. T., and Cunningham, J. G. Childrens decoding of emotion in expressive body movement: The development of cue attunement. Developmental psychology 34 (1998), 10071016.

    de Meijer, M. The contribution of general features of body movement to the attribution of emotions. Journal of Nonverbal Behavior 13, 4 (1989), 247268.

    Engineer Balomenos, T., Raouzaiou, A., Ioannou, S., Drosopoulos, A., Karpouzis, K., and Kollias, S.

    Emotion analysis in man-machine interaction systems. In Machine learning for multimodal interaction. Springer, 2005, 318328.

    Coulson, M. Attributing emotion to static body postures: Recognition accuracy, confusions, and viewpoint dependence. Journal of nonverbal behavior 28, 2 (2004), 117139.

    reference: Stefano Piana, Alessandra Staglian, Francesca Odone, Alessandro Verri, Antonio Camurri, Real-time Automatic Emotion Recognition from Body Gestures in ArXiv 2014 138

  • ?

    12 actorsfour female and eight malesaged between 24 and 60total of about 100 videosseparate clips of expressive gesture

    reference: 1) Stefano Piana, Alessandra Staglian, Francesca Odone, Alessandro Verri, Antonio Camurri, Real-

    time Automatic Emotion Recognition from Body Gestures in ArXiv 20142) Amol S. Patwardhan and Gerald M. Knapp, Augmenting Supervised Emotion Recognition with

    Rule-Based Decision Model. in ArXiv 2016

    QualisysKinect

    139

  • Data Validation Human annotation?

    The sole 3D skeleton is a guarantee that the user is not exploiting other information

    Not easy for human to

    recognize emotion only based

    on gesture

    reference: Stefano Piana, Alessandra Staglian, Francesca Odone, Alessandro Verri, Antonio Camurri, Real-time Automatic Emotion Recognition from Body Gestures in ArXiv 2014 140

  • Skeleton based feature

    anger sadness

    happiness fear

    surprise disgust

    reference:1)Stefano Piana, Alessandra Staglian, Francesca Odone, Alessandro Verri, Antonio Camurri, Real-time Automatic Emotion Recognition from Body Gestures in ArXiv 20142) Piana, S., Stagliano`, A., Camurri, A., and Odone, F. A set of full-body movement features for emotion recognition to help children affected by autismspectrum condition. In IDGEI International Workshop (2013).

    Histogram: Energy on each of frames

    141

  • Classification Result

    Qualisys Data with 310 gestures

    Kinect Data with 579 gestures

    Clean Dataset

    Noisy Dataset

    Almost the same to the humans recognition ability 142

  • Skeleton Capture Method

    Kinect

    reference:https://itp.nyu.edu/classes/dance-f16/kinect/,

    https://github.com/CMU-Perceptual-Computing-Lab/openposehttps://www.qualisys.com/

    expensive, sophisticated system with multiple high speed camera

    cheap, easy to get RGB-D 3D camera device

    free, new software system with CNN

    Qualisys

    OpenPose

    143

    https://itp.nyu.edu/classes/dance-f16/kinect/https://github.com/CMU-Perceptual-Computing-Lab/openposehttps://www.qualisys.com/

  • OpenPose: CNN based Method

    144

  • Pose difference/movement indicative of arousal mostly

    145

    AffectNatural Language

    Non-Verbal

    Speech Physiology

    Face

    Body Gestures

  • 1

    /

    2

    4

    1

    *

    *

    *

    3

    6

    1

    **

    1

    5

    Expressive Data AI Learning and Inference ?

    146

  • 1

    /

    2

    4

    1

    *

    *

    *

    3

    6

    1

    **

    1

    5

    Internal Data AI Learning and Inference ?

    147

  • -Polyvagal theory

    Stephen Porges

    148

  • (immobilization)

    (fight -flight)(mobilization)

    149

  • reference:D.S.Quintana,A.J.Guastella,T.Outhred,I.B.Hickie,andA.H.Kemp. Heart rate variability is associated with emotion recognition: direct evidence for a relationship between the autonomic nervous system and social cognition. Int. J. of Psychophysiol, 86(2):168172, 2012http://blog.sina.com.cn/s/blog_753e49f90100pop2.htmlhttp://www.xzbu.com/6/view-2908185.htm

    150

    http://blog.sina.com.cn/s/blog_753e49f90100pop2.htmlhttp://www.xzbu.com/6/view-2908185.htm

  • HRV (Heart Rate Variability)

    ANS

    (HRV)

    (HRV analysis) [2]

    reference: 1) Mara Teresa Valderas , Juan Bolea, Pablo Laguna, Montserrat Vallverd, Raquel Bailn, Human Emotion Recognition Using Heart Rate Variability Analysis with Spectral Bands Based on Respiration in Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE.2) Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology (1996) Heart rate variability. Standards of measurement, physiological interpretation, and clinical use. Eur Heart J 17(3):354-813) D.S.Quintana,A.J.Guastella,T.Outhred,I.B.Hickie,andA.H.Kemp. Heart rate variability is associated with emotion recognition: direct evidence for a relationship between the autonomic nervous system and social cognition. Int. J. of Psychophysiol, 86(2):168172, 2012.

    151

  • HRV

    total power, TP

    ms2

    0.4Hz

    very low

    frequency power, VLFP

    ms2

    0.04Hz

    low frequency

    power, LFPms2

    0.04-0.15Hz

    high frequency

    power, HFPms2

    0.15-0.4Hz

    normalized LFP, nLFP

    ,n.u. LF/(TP-VLF)

    normalized HFP,nHFP

    ,n.u. HF/(TPVLF)

    LF/HF

    https://zh.wikipedia.org/wiki/%E5%BF%83%E7%8E%87%E8%AE%8A%E7%95%B0%E5%88%86%E6%9E%90152

  • Emotion elicitation real experiences film clips problem solving computer game interfaces images spoken words music

    Movie clips method emotion inducing method more efficient than others verified by

    previous studies 4 films (3- 10 min for each one) 4 emotion: angry, fear, sad and happy ECG data was record for 90 sec at 2 min before the end of movies.

    reference: 1)Han Wen Guo, Yu Shun Huang, Jen Chien Chien, Jiann Shing Shieh, Short-term Analysis of Heart Rate Variability for Emotion Recognition via a Wearable ECG Device in Intelligent Informatics and Biomedical Sciences (ICIIBMS), 20152) Mimma Nardelli, Gaetano Valenza, Alberto Greco, Antonio Lanata, Enzo Pasquale Scilingo, Recognizing Emotions Induced by Affective Sounds through Heart Rate Variability in IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 6, NO. 4, OCTOBER-DECEMBER 2015

    induced

    153

  • ECG process pipeline

    reference: Abhishek Vaish and Pinki Kumari, A Comparative Study on Machine Learning Algorithms in Emotion State Recognition Using ECG in Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), December 28-30, 2012

    154

  • ECG Feature Extraction: HRV

    Time Domain Feature Frequency Domain Feature

    reference: Han Wen Guo, Yu Shun Huang, Jen Chien Chien, Jiann Shing Shieh, Short-term Analysis of Heart Rate Variability for Emotion Recognition via a Wearable ECG Device in Intelligent Informatics and Biomedical Sciences (ICIIBMS), 2015

    1. MeanRRI:

    average of resultant RR intervals. 2. CVRR:

    the ratio of the standard deviation and mean of RR intervals.

    3. SDRR:stand deviation of the RR intervals.

    4. SDSD:

    standard deviation of the successivedifferences of the RR intervals.

    1. LF (low frequency):

    standardized LF power (0.04-0.15 Hz)

    2. HF (high frequency):

    standardized HF power (0.15-0.4 Hz)

    3. LHratio:the ratio of LF/HF

    he shapes of the probability distributions

    Statistic Feature

    Evaluate the distribution :

    155

  • Analysis on Feature

    Time Domain Feature Frequency Domain Feature Statistics Feature

    reference: Han Wen Guo, Yu Shun Huang, Jen Chien Chien, Jiann Shing Shieh, Short-term Analysis of Heart Rate Variability for Emotion Recognition via a Wearable ECG Device in Intelligent Informatics and Biomedical Sciences (ICIIBMS), 2015

    156

  • Classifier

    reference: Han Wen Guo, Yu Shun Huang, Jen Chien Chien, Jiann Shing Shieh, Short-term Analysis of Heart Rate Variability for Emotion Recognition via a Wearable ECG Device in Intelligent Informatics and Biomedical Sciences (ICIIBMS), 2015 157

  • ,

    158

  • 1

    /

    2

    4

    1

    *

    *

    *

    3

    6

    1

    **

    1

    5

    ? !

    159

  • 160

  • Group-Level EmotionThin Slice

    : Multi-Task

    Cross CorpusCommon ground

    Cross Lingual Perspective

    161

  • LLD

    : ?

    Encoding

  • Result & Discussion (Binary Classification: Unweighted Average Recall)

    Database Act. Feature Rep. Val. Feature Rep.

    CIT 0.658 Praat BoAW 0.613 Praat FV

    IEMOCAP 0.769 EGEMAPS Func. 0.663 Praat FV

    NNIME 0.65 Praat FV 0.564 Praat BoAW

    RECOLA 0.634 EGEMAPS Func. 0.602 Praat BoAW

    VAM 0.811 ComP_LLD FV 0.665 EGE_LLD BoW

  • Variational Deep Embedding Fisher Scoring

    : ?

    : !

  • Generated Perspectives Multi-view Kernel Fusion

    1. Chun-Min Chang, Bo-Hao Su, Shih-Chen Lin, Jeng-Lin Li, Chi-Chun Lee, "A Boostrapped Multi-View Weighted Kernel Fusion Framework for Cross-Corpus Integration of Multimodal Emotion Recognition"in Proceedings of ACII 2017

    2. Chun-Min Chang, Chi-Chun Lee, "Fusion of Multiple Emotion Perspectives: Improving Affect Recognition Through Integrating Cross-Lingual Emotion Information" in Proceedings of the InternationalConference on Acoustics, Speech, and Signal Processing (ICASSP), 2017

    ?

  • Group-Level EmotionThin Slice

    Multi-Task

    Cross CorpusCommon ground

    Cross Lingual Perspective

    166

  • memory

    cognitive

    emotion

    167

  • 1

    2 3

    4

    (/)

    168

  • alexithymia

    169

  • 1

    2

    3

    4

    5

    170

  • (/)Mental Well-being

    ()

    Ref: https://ac.els-cdn.com/S1877042815003080/1-s2.0-S1877042815003080-main.pdf?_tid=238e46fe-da36-11e7-86e8-00000aab0f26&acdnat=1512531351_6db4641b5d3531d365e0f207f474d65f

    171

  • ()Well-beings

    (Boredom, it turns out, can be a dangerous and disruptive state of mind that damages your health)(Manns )

    Ref: http://alcoholrehab.com/drug-addiction/boredom-and-substance-abuse/

    Boredom

    Ref: On the Function of Boredom (Shane W. Bench) 172

  • Ref: The Facilitation of Social-Emotional Understanding and Social Interaction in High-Functioning Children with Autism: Intervention Outcomes

    :

    Ref: Social Skills Deficits in Children with Autism Spectrum Disorders: Evidence Based Interventions

    173

  • (Pintrich, 1991, p. 199)

    ( special issue of theEducational Psychologist)

    Ref: The Importance of Students Goals in TheirEmotional Experience of Academic Failure:Investigating the Precursors and Consequences of Shame (Jeannine E. Turner )

    174

  • The importance of Students Goals in Their Emotional Experience of Academic Failure: Investigating the Precursors and Consequences of Shame,

    The Importance of Students Goals in TheirEmotional Experience of Academic Failure:Investigating the Precursors and Consequences ofShame (Jeannine E. Turner)

    175

  • FMRI()

    Ref: https://blog.hubspot.com/marketing/emotions-in-advertising-examples

    176

  • (communication and perception of emotion in music)

    (emotional consequences of music listening)

    (predictors of music preferences)

    Swathi Swaminathan

    Ref: Current Emotion Research in Music Psychology (Swathi Swaminathan )177

  • SENSE EMOTION

    ..

    178

  • 1

    /

    2

    4

    1

    *

    *

    *

    3

    6

    1

    **

    1

    5

    !

    179

  • functional Magnetic Resonance Imaging (fMRI)

    180

  • 181

  • Uses a standard MRI scanner Acquires a series of images (numbers) Measure changes in blood oxygenation Use non-invasive, non-ionizing radiation Can be repeated many times; can be used for a

    wide range of subjects

    Combines good spatial and reasonable temporal resolution

    Synopsis of fMRI

    182

  • Blood-Oxygen-Level dependent (BOLD)

    183

  • Emotion Perception Decoding from fMRI

    fMRI Dataset

    Interactionbehavior

    SPM Preprocessing

    Emotion

    MachineLearning

    Behaviorobservation

    184

  • Emotional modules

    185

  • Co-activation graph for each emotion category

    A) Force-directed graphs for each emotion category, based on the Fruchterman-Reingold spring algorithm

    B) The same connections in the anatomical space of the brain.

    186 -

  • ?

    187

    ?

  • 1

    /

    2

    4

    1

    *

    *

    *

    3

    6

    1

    **

    1

    5

    INFERENCE?

    188

  • 189

    ?

  • Our Research: Human-centered Behavioral Signal Processing (BSP)

    Prof. Shrikanth Narayanan

    Seek a window into human mind and traits

    through engineering approach

    S. Narayanan and P. G. Georgiou, Behavioral signal processing: Deriving human behavioral informatics from speech and language," Proceedings of the IEEE, vol. 101, no. 5, pp. 12031233, 2013.Daniel Bone, Chi-Chun Lee, Theodora Chaspari, James Gibson, Shrikanth Narayanan, "Signal Processing and Machine Learning for Mental Health Research and Clinical Applications", in IEEE Signal Processing Magazine

    (EECS713)Behavioral Informatics and Interaction Computation Laboratory (BIIC)

  • (Signal Processing)

    (Machine Learning)

    (Decision Analytics)

    High-dimensional Behavior Space, Non-linear Predictive Recognition, Multimodal Integration, Experts Decision Mechanism

    Spatial-temporal modelingDe-noisingFeature extraction

    SupervisedUnsupervisedSemi-supervised

    Our Technology: Human-centric Decision Analytics Research & Development

    Core Technology

    Speech & LanguageDiarization, SpeakerID, ASR,

    Paralinguistic Descriptors, Emotion-AI, Sentiment, Word-topic

    Representation

    Computer VisionSegmentation, Tracking, Image-

    Video Descriptors

    Multimodal FusionJoint speech-language-gesture

    modeling for multimodal prediction, Multi-party interaction modeling

    Representation LearningBehavior embedded space learning,

    clinical health informatics data representation

    Predictive LearningDeep-learning, machine learning

    based predictive modeling

  • BIICInterdisciplinary

    Research

    ASD

    PAIN

    EHR

    fMRI

    EMO-AI

    ORAL

    Mental Health

    Clinical Health

    Affective Computing

    Education

    Our Application: Human-centered Exemplary BSP Domains

    Flow

    Consumer Behavior

    EMO-AI

    Neuroscience

    KEY APPLICATIONS

    Affective Computing

    Mental Health

    Clinical Health

    Education

    Neuroscience

    Consumer Behavior

    :

  • 193

    :

    Computing beyond status-quo in making a positive impact

  • Factual Conceptual Procedural Metacognitive

    Computation Blueprints

    BehaviorComputing

    HealthAnalytics

    Affect Recognition

    Emphatic Computing

    Social Computing

    Value-Sensitive

    TechnologyAffectiveFeedback

    Interpersonal RelationshipComputing

    Cognitive Feedback

    Fulfillment Empowerment

    Motivation

    Internal States

    External Functions

    Our Vision: Human-Centric Computing (HCC)computationally innovate human-centric empowerment enabling next-generation entity intelligence

  • 195

    PHD

    BIIC LAB MEMBERS

  • 196

    BIIC Lab @ NTHU EE

    http://biic.ee.nthu.edu.tw

    THANK YOU . . .