the acoustic emotion gaussians model for emotion-based music annotation and retrieval reporter :...

51
The Acoustic Emotion Gaussians Model for Emotion- based Music Annotation and Retrieval Reporter : 張張張 The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval 1

Upload: emil-boyd

Post on 24-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and RetrievalReporter : 張書堯

1

Page 2: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Author

• Ju-Chiang Wang (王如江 )• Experience• Institute of Information Science. Academia Sinica, Taipei, Taiwan• Department of Electrical Engineering, NTU

• Major Research Area• Multimedia Information Retrieval• Machine Learning• Music and Speech Signal Processing• Modern Music Production and Engineering• Multimedia Interface Design

2

Page 3: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Author

• Yi-Hsuan Yang(楊奕軒 )• Experience

• Institute of Information Science. Academia Sinica, Taipei, Taiwan• Research Center for Information Technology Innovation , Academia

Sinica, Taipei, Taiwan

• Major Research Area• Music information retrieval, analysis, and visualization • Machine learning• Multimedia system• Smart phone and cloud-based applications• Lyrics analysis

3

Page 4: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Author

• Hsin-Min Wang(王新民 )• Experience

• Institute of Information Science. Academia Sinica, Taipei, Taiwan• Research Center for Information Technology Innovation , Academia

Sinica, Taipei, Taiwan

• Major Research Areas• Spoken Language Processing• Natural Language Processing• Multimedia Information Retrieval• Pattern Recognition

4

Page 5: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Author

• Shyh-Kang Jeng(鄭士康 )• Experience• Department of Electrical Engineering , NTU

• Major Research Areas :• Design of Antennas and Circuits for Wireless Communications • Music Signal Processing• Artificial Intelligence and Multimedia

5

Page 6: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Introduction

• Music research on emotion • Develop a computational model that comprehends the affective

content of music signal and organizes a music collection according to emotion.

• Annotation of music emotion• it has two main challenge

• Perception of emotion is in nature subjective• Music emotion is time-varying dynamics

6

Page 7: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Introduction

• Perception of emotion is in nature subjective

• People perceive differently when listening the same song.• Not Ground Truth and Objective :• The label is not always applies well for every listener

• Need to change the way of annotation• E.g. [17] learn from label provided by different annotators and

present as the final result a soft (probabilistic) emotion assignment instead of a hard (deterministic) one

[17] V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from crowds. J. Machine Learning Res., 11:1297-1322, 2010.

7

Page 8: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Introduction

• Music emotion is time-varying dynamics• Music’s most expressive qualities are related to its structural

changes across time• From presentation of [5]and [23] , Emotion can be considered as

numerical value (instead of discrete label )

• The value is over numbers of dimensions ,such as valence and activation ,which is the two most fundamental dimension found by psychologists [18]

• valence : the intrinsic attractiveness (positive valence) or averseness

(negative valence) of an event, object, or situation• Activation : the stimulation of the cerebral cortex into a state of general

wakefulness or attention (intensity of emotion)

[5] A. Gabrielsson. Emotion perceived and emotion felt: Same or dierent? Musicae Scientiae, pages 123-147, 2002.[23] E. Schubert. Modeling perceived emotion with continuous musical features. Music Perception, 21(4):561-585, 2004 .[18] J. A. Russell. A circumplex model of aect. J. Personality and Social Science, 39(6):1161{1178, 1980.

8

Page 9: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

Introduction

• Music emotion is time-varying dynamics

• In this way , music emotion recognition becomes the prediction of moment-to-moment valence and activation (VA ) values corresponding to a series of point in the emotion space.

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

A

nnot

atio

n an

d R

etri

eval

9

Page 10: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Introduction

• In early research about emotion recognition, represent the emotion of the song by single-level discrete emotion label

• In recent years ,attempt to model the emotion of the song by probability distribution in the emotion space.• E.g. [18]

[18] J. A. Russell. A circumplex model of aect. J. Personality and Social Science, 39(6):1161-1178, 1980.

10

Page 11: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Introduction

• Developer of an emotion-based music retrieval system can better understand how likely a specific emotional expression (terms of VA value )would be perceived at each temporal moment

• From the application point of view :• Create the simple user interface for music retrieval through the

specification of a point or a trajectory in the emotion space.• Estimate the emotion

• From the theorem of view :• The fixed label The time-varying stochastic distribution• It means functional music retrieval system cannot be dissociated

from the underlying psychological implications.

11

Page 12: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Introduction

• This paper propose an novel acoustic emotion Gaussian(AEG) model• Realize the proper generative process of music emotion

perception in a probabilistic and parameter frame work

• AEG model learns from data two set of Gaussian mixtured models(GMMs), named acoustic GMM and VA GMM• Acoustic GMM : low level acoustic feature• VA GMM : high level emotion space

• A set of latent feature classes is introduced to play the end-to-end linkage between the two space and aligns two space

12

Page 13: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Introduction

• AEG model has three additional advantage• As a generative model, AEG permits easy and straightforward

interpretation of the model learning and mapping process

• Involves light-weight computation of emotion prediction• Can track the distributions in every time of a music piece in real-time

• Provides great flexibility to future extension

13

Page 14: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Introduction

• A comprehensive accuracy performance study is conducted to demonstrate the superior accuracy of AEG over its predecessors using two emotion annotated music corpora MER60 .[30] and Mturk .[26]

• Moreover ,for the first time a quantitative evaluation of emotion-based music retrieval is reported

14[30] Y.-H. Yang and H. H. Chen. Predicting the distribution of perceived emotions of a music signal for content retrieval. IEEE Trans. Audio, Speech and Lang. Proc.,19(7):2184-2196, 2011.

[26] J. A. Speck, E. M. Schmidt, B. G. Morton, and Y. E. Kim. A comparative study of collaborative vs. traditional musical mood annotation. In ISMIR, 2011.

Page 15: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Related Work

• Regression Approach [4,16,31]• In early work ,consider the perceive of music clip can be represented

as single point. (but , in practice ,not ground truth)• Multiple Linear Regression (MLR), Support Vector Regression(SVR)

• Use this approach in VA space for present years

• The heatmap approach [21,30]• Quantized each emotion dimension by G equally spaced discrete

samples , leading G x G grid representation of the emotion space• Trains regression models to predict the emotion intensity of each

emotion point.• Discrete rather than continuous representation of the VA space

• i.e. emotion intensity cannot be regarded as a probability estimate in a strict sense

15

Page 16: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Related Work

• Gaussian Parameter Approach [19,26,30]• Directly learns five regression models to predict the

mean ,covariance and etc.• Allows easier performance analysis

16

Page 17: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Related Work

• Some reference about it• Regression Approach [4,16,31]

[4] T. Eerola, O. Lartillot, and P. Toiviainen. Prediction of multidimensional emotional ratings in music from

audio using multivariate regression models. In ISMIR, 2009.

[16] K. F. MacDorman, S. Ough, and C.-C. Ho. Automatic emotion prediction of song excerpts: Index construction,

algorithm design, and empirical comparison. J. New Music Res., 36(4):281-299, 2007.

[31] Y.-H. Yang, Y.-C. Lin, Y.-F. Su, and H. H. Chen. A regression approach to music emotion recognition. IEEE

Trans. Audio, Speech and Lang. Proc., 16(2):448{457, 2008.

• The heatmap approach [21,30][21] E. M. Schmidt and Y. E. Kim. Modeling musical emotion dynamics with

conditional random elds. In ISMIR, 2011.

[30] Y.-H. Yang and H. H. Chen. Predicting the distribution of perceived emotions of

a music signal for content retrieval. IEEE Trans. Audio, Speech and Lang.

Proc.,19(7):2184-2196, 2011.

• Gaussian Parameter Approach [19,26,30] [19] E. M. Schmidt and Y. E. Kim. Prediction of time-varying musical mood distributions from audio. In ISMIR,

2010.

[26] J. A. Speck, E. M. Schmidt, B. G. Morton, and Y. E. Kim. A comparative study of collaborative vs. traditional

musical mood annotation. In ISMIR, 2011.

[31] Y.-H. Yang, Y.-C. Lin, Y.-F. Su, and H. H. Chen. Aregression approach to music emotion recognition. IEEE

Trans. Audio, Speech and Lang. Proc., 16(2):448{457, 2008.

17

Page 18: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Entities

Framework Overview

Learning the AEG Model

Learning the VA GMM

Emotion-based annotation and retrieval using AEG

Interpretation of AEG

Evaluation and Result

Conclusion and Future Work

18

Page 19: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Framework overview

• The system architecture outlines• The Acoustic Emotion Gaussian Model• Generation of Music Emotion• Emotion-based Music Retrieval

19

Page 20: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Framework overview

• The Acoustic Emotion Gaussian (AEG) Model• AEG model realize the generation of music emotion in the

emotion space• This Model has two assumption

• The emotion annotations of a music clip can be parameterized by a bivariate Gaussian distribution

• The music clips in similar acoustic features also result in similar emotion distributions in VA space.

• However, it is very difficult to identify directly• Thus , this paper introduce a set of latent feature class (generate

process ) as the bridge of two spaces

20

Page 21: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Framework overview

• Generation of music emotion

• Advantage : intuitive to interpret the final emotion prediction to the users with only a set of mean and covariance rather than complicated VA GMM , -weighted GMM .etc in the emotion prediction phase 21

Page 22: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Framework overview

• Emotion-based Retrieval• The feature indexing phase

• Two indexing approach• Fixed-dimension vector

• Indexing with the acoustic GMM posterior

• Single 2D-Gaussian :• Indexing with the predicted emotion distribution of a clip given by automatic

emotion prediction

• The music retrieval phase• Two matching method• Pseudo song-based matching• Distribution likelihood-based matching

22

Page 23: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Framework overview

• Flowchart of the content-based music search system using a VA-based emotion query.

23

Page 24: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Framework overview

• Pseudo song Estimation• In the fold-In method ,a given query point is transformed into

probabilities • The result represent the importance of the k-th latent VA

Gaussian for the input point• Use similar property with acoustic GMM posterior that are used to

index music clips in the music database

24

Page 25: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Learning the AEG Model

• To start the generative process of the AEG model, we utlize a universal acoustic GMM, which is pre-learned using the EM algorithm to universal set of frame vectors F ,to span a probabilistic space with a set of diverse acoustic Gaussians .

• The learned acoustic GMM is expressed follow

25

k : k-th component Gaussian x : frame-based featureprior weight : latent class feature: mean value : covarience matrix

Page 26: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Learning from AEG model

• Suppose we have an annotated music corpus X with N clips ,each is denoted as and its t-th frame vector is denoted as ,where is a number of frames.

• The acoustic posterior probability of for is computed by

• In the implementation, the paper assumed the mixture prior (i.e. , ) be , because it is not useful in the previous work[28]

26

[28] J.-C. Wang, H.-S. Lee, H.-M. Wang, and S.-K. Jeng. Learning the similarity of audio music in bag-of-frames representation from tagged music data. In ISMIR, 2011.

Page 27: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Learning from AEG model

• Thus , the clip-level acoustic GMM posterior probability can be summarized by average the frame-level ones

• The acoustic GMM posterior of is represented by vector , whose k-th component is

27

Page 28: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Prior Model for Emotion Annotation• This paper introduce a user prior model to express the

contribution of each individual subject.• Given the emotion annotation of

• : the annotation by j-th subject • : the number of subjective who have annotated

• We build a label confidence model with the following Gaussian Distribution

• where ,

28

Page 29: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Prior Model for Emotion Annotation• The posterior probability of user can be calculated by

normalizing the confidence likelihood of over the cumulative confidence likelihood of all users for

29

is referred to as the clip-level user prior, as it indicates the confidence of each annotation for the clip.

Page 30: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Prior Model for Emotion Annotation• Based on the clip-level user prior, we further derive the corpus-

level clip prior to describe the importance of each clip as follows

• If the annotators of a clip are more consistent ,the song is consider less subjective ,else the model is more reliable

30

Page 31: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Prior Model for Emotion Annotation• This two case both lead to larger cumulative annotation

confidence likelihoods.• With this two priors , we define the annotation prior by

multiplying both priors.

31

Page 32: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Learning the VA GMM

• We assume that of for in X can be generated from weighted VA GMM governed by acoustic GMM posterior of

32

: mean vector of the k-th latent VA Gaussian : convariance matrix of the k-th latent VA Gaussian

Page 33: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Learning the VA GMM

• First , for each clip , we derive its acoustic prior which is the acoustic GMM posterior vector and annotation of prior for each annotation and

• These two prior is computed beforehand and stay fixed during the whole learning process.

• Using E to denote the whole set of annotations in X, the total likelihood can be derive by

33

Page 34: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Learning the VA GMM

• Take logarithm for p(E|X)

where = 1

To learn the VA GMM we maximize the log-likelihood L with respect to parameters of VA GMM.

34

Page 35: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Learning the VA GMM• But it is intractable as two layer summation over the latent

variables appears inside the logarithm .

• We first derive a lower bound of L according to Jensen’s inequality

• Replace L by , it will acts only on one summation over a latent variable .

• Thus , we use EM method

35

Page 36: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Learning the VA GMM

• Expectation Maximization Method• In the M-step

• derive the expectation over the posterior probability of given a user's annotation for

where

36

Page 37: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Learning the VA GMM

• Expectation Maximization Method• In the M-step

• Maximize Q and drive the following update rules

37

Page 38: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Learning the VA GMM

• Expectation Maximization Method• Algorithm

38

Page 39: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Emotion-based Annotation and retrieval using AEG• The VA GMM can generate the emotion annotation of an

unseen clip given the clip’s acoustic GMM posterior as follows,

• This can be resorted to the information theory to calculate the mean and covariance of the representative VA Gaussian by solving the following optimization problem,

• Where () denotes the one-way KL divergence from to

39

Page 40: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Emotion-based Annotation and retrieval using AEG

40

Page 41: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Emotion-based Annotation and retrieval using AEG• Emotion-Based Music Retrieval• Given the query point the system has to fold in the point into the

AEG model and estimate a pseudo song in an outline fashion

• This can be done by maximizing the -weighted VA GMM as follows

41

Page 42: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Emotion-based Annotation and retrieval using AEG

• And this also can be used fir EM algorithm • In E-step

• In M-step• Update by

42

Page 43: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Interpretation of AEG

• Music Corpora • Database

43

MER60 MTurk

Number of clips 60 240

Clips during (sec) 30 15

Subjective 99

Subjective for each annotated clips

40 7~23

Songs English pop songs US pop2002Reference[ Mturk ] J. A. Speck, E. M. Schmidt, B. G. Morton, and Y. E. Kim. A comparative study of collaborative vs. traditional musical mood annotation. In ISMIR, 2011.[ MER60 ] Y.-H. Yang and H. H. Chen. Predicting the distribution of perceived emotions of a music signal for content retrieval. IEEE Trans. Audio, Speech and Lang. Proc., 19(7):2184-2196, 2011.

Page 44: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Interpretation of AEG

• Frame-based Acoustic Features• Use MIR toolbox 1.3 [14] to extract the following four types of

frame feature

44

types Dimension features

dynamic 1 RMS (energy)

spectral 11 spectral (centroid, spread, skewness , kurtosis, entropy, flatness, roll-off 85%, roll-off 95%, brightness ,roughness , and irregularity)

Timbre 41 zero crossing rate, spectral flux, 13-MFCCs, 13 delta-MFCCs, and 13-delta-delta MFCCs

tonal 14 key clarity, key mode possibility, HCDF,12-bin chroma, chroma peak, and chroma centroid

Page 45: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Interpretation of AEG• Due to Music source of Mturk database is not available ,just use

the feature by kindly provider

45

types dimension features

MFCCs 20 Low-dimensional representation of the spectrum warped according to Mel-scale.

Chroma 12 Autocorrelation of chroma is used, providing an indication of modality.

Spectrum descriptors 4 Spectral centroid, flux,Roll-off and flatness. Often related to timbral texture

Spectral contrast 14 Rough representation of the harmonic content in the frequency domain.

Page 46: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Evaluation and results• Evaluation Setting and Metric• The AEG model can predict the emotion of an input music piece

as VA GMM or single VA Gaussian• Measuring the distance between two GMMs is complicated and

sometimes inaccurate• choose to evaluate the accuracy by single Gaussian

• The performance can be evaluate by two metrics• One-way KL Divergence (Relative Entropy)

• Average KL divergence(AKL)

• Euclidean distance of mean vector between the predicted point and ground truth VA Gaussian for a clip • Average Mean Euclidean distance (AMD)

46

Page 47: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Evaluation and results

• For the emotion-based music retrieval task, user input the query point on VA plane and receive a ranked list of piece from un-labeled database• Randomly generate 500 2-D points uniformly covering the

available VA space to form a test query set• To evaluate the retrieval performance using the test query set, we

first derive the ground truth relevance R(i) between a query point and the i-th piece in the database by feeding the query point into the ground truth Gaussian PDF of the i-th piece.

• For each test query point, the ground truth relevance is incorporated into the normalized discounted cumulative gain (NDCG) [10] to evaluate the ranking of pieces.

47

[10] K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of IR techniques. ACM Trans. on Info. Syst.,20(4):422-446, 2002.

P: top

Page 48: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Evaluation and results

48

Page 49: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Evaluation and results

49

Page 50: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Conclusion and Future Work

• This paper has presented the novel acoustic emotion Gaussians model, which provides a principled probabilistic framework for music emotion analysis. • interpreted the learning of AEG • provided insights. • A comprehensive evaluation of automatic music emotion

annotation on two emotion annotated music corpora has demonstrated the effectiveness of AEG over state-of-the-art methods.

• demonstrated the potential application of AEG• emotion-based music retrieval.

50

Page 51: The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval Reporter : 張書堯 The Acoustic Emotion Gaussians Model for Emotion-based

The

Aco

usti

c E

mot

ion

Gau

ssia

ns

Mod

el f

or E

mot

ion-

base

d M

usic

Ann

otat

ion

and

Ret

riev

al

Conclusion and Future Work

• Future work• The probabilistic framework provides a blueprint for multi-modal

music emotion modeling.

• With a set of learned latent feature classes, multimodal features can be aligned to one another.

• Investigate personalized music emotion recommendation.

• Study non-parametric Bayesian methods for the modeling of music emotion, and to investigate whether we can get better results without assuming that the VA space is Euclidean.

51