얼굴 검출 기법과 감성 언어 인식기법

Brown Bag Seminar

Part I: 얼굴 검출 기법Part II: 감성 언어 인식 기법

2011. 3. 11( 금 ).

김성호

영남대학교 전자공학과

Part I: 얼굴 검출 기법 연구 [IPIU 2011 학회 발표 ]

Motivation

2

Proposed Object Representation Scheme

3

Viewpoint Figure/Ground mask Local appearance

For 2D object: (object center, scale)For 3D object: 3D object pose

Boundary shapeFigure/ground information

Appearance codebookPart pose

Joint appearance and shape model

4

Visual Context in the Joint Appearance & Shape Model

How to integrate those contextual cues?

Weak neighbor support

Strong neighbor support

Cooperative

BU+TD

Spatial Context Hierarchical Context

Part – Part context(bottom-up)

Object - Backgroundcontext (top-down)

Part – Whole context(bottom-up/top-down)

Grouping property

Supporting contextually related category

Predicting figure-ground

5

Utilize graphical model especially Directed graphical model (Bayesian Net)

Mathematical Formulation for Categorization (1/2)

( , , )H C V M Solution: Category label, Viewpoint, Mask: input feature

: example-based model

G

D

Key issue: difficult modeling of priordue to complex high dimensions

Our approach

V

M

F

A X

{C,B}

N

Top-downBottom-up

NN

Viewpoint

Figure-ground

Codebook index

b2f4

f5b4 b5

b6b3f3b1f1f2

V

M

F

G

{C,B}

appearance

pose

6

Learning for Distributed Category Representation

CC: Category specific Codebook for top-down inference

UC: Universal Codebook for bottom-up inference

… … …

……

…

Joint appearance and boundary with viewpoint

Car Airplane

Issue How to select optimal codebook (CB) for category representation?

Previous constellation model: fixed no. of parts Cannot handle large variationsWhy distributed? To handle large intra class variations

7

Codebook Selection Reducing Surface Markings

Focus What codebook can reduce the effect of surface markings?

Our strategy Intermediate blurring

Statistical property Entropy

Repeatable partSurface marking part

8

Low entropy surface marking

High entropy Semantic parts

Entropy of Candidate Codebook

Finding:High entropy codebook in should be selected for surface marking reduction

2( | ) ( | ) log ( | )

: Set of label of object instances

: Candidate of codebook

l L

H L F p l F p l F

L

F

Part-whole context

Part-part context(estimate weight)

Inference Flow related to Category Model

InputDense feature

Matching to UC

Grouping (similarity & proximity)

…

…

…

Car Airplane…

+

background CB

UCB

CCB

Car category

Multi-modal viewpoint

Multi-modal figure-ground

mask

Final result

Category Model

9

Demo of Categorization and Segmentation

10

11

Category Detection: Caltech Face Dataset [DB1]

About face DB 435 face images with clutter 468 background images

Learning Randomly select 15 faces Randomly select 15 background

Test 200 novel face images 200 novel background

[DB1] http://www.robots.ox.ac.uk/~vgg/data3.html[Weber00] M. Weber, M. Welling, and P. Perona, “Unsupervised learning of models for recognition”, In Proc. ECCV, pp. 18–32, 2000.[Fergus03] R. Fergus, P. Perona, A. Zisserman, “Object class recognition by unsupervised scale invariant learning”, In CVPR, 2003.[Shotton05] J. Shotton, A. Blake, R. Cipolla, “Contour-based learning for object detection”, In ICCV, 2005.

Method

NtrainROC EER

(Region error<25%)Unsegmented Segmented

[Weber00] 200 0 94.0%

[Fergus03] 220 0 96.4%

[Shotton05] 50 10 96.5%

Ours 0 15 97.3%

12

Examples of Face Detection

13

Test image Bottom-up viewpoints Bottom-up mask

Hypothesized viewpoint Hypothesized mask Final Inference resultby Boosted MCMC

14

Test Results in Real Scene (KAIST)

Note: We use Caltech DB and test real images.

Conclusions and Discussions

Joint appearance and boundary with viewpoint is suitable object model for the object categorization in cluttered scenes.

Visual contexts (part-part, part-whole, object-background context) can discriminate ambiguous figure-ground.

Bayesian Net can model both the categorization and the figure-ground segmentation.

Boosted MCMC can provide efficient inference for cluttered objects.

Future work Modeling of more flexible figure-ground mask Using boundary shape in likelihood calculation

15

Part II: 감성언어 인식 기법 연구 - Introduction

Speech A sequence of elementary acoustic symbols

Information in speech Gender information, age, accent, speaker’s identity, health, and

emotion

Emotional speech recognition Recently, increased attention in this area 융합과제 : 반한 감정에 대한 정량적 분석에 도움 .

16

Structure of Emotional Speech Recognition

핵심 Feature extractor Classifier

17

Recognized emotions

MFCC SVM orNearest class mean classifier

Feature for Emotional Speech Recognition

Mel Frequency Cepstral Coefficients (MFCC) Convey information of short time energy in frequency domain

18

Signal

Fourier transform (frequency domain)

Mapping the power spectrum onto the mel scale

Take Log of the mel frequency

Final MFCC: Amplitude of resulting spectrum

Mel scale: 사람이 차이를 느끼는 주파수 간격

Classifier: Support Vector Machine

19

Feature space Learning: Finding optimal classifier

Recognition: Performed by the learned classifier

Classifier: Nearest Class Mean

20

Feature space

Learning: Finding class means

Recognition: Finding nearest class

Exp.1 on EMO Database

구성 7 종의 감정 데이터 (happy, angry, anxious, fearful, bored,

disgusted, neutral) 10 종의 문장 10 명의 성우 ( 남 5, 여 5) 언어 : 독일어

21

anger

happy

boredom

Recognition using Nearest Class Mean Classifier

Learning: 150 (randomly selected), test: 150

22

Recognition rate: 47.0%

Recognition using SVM

Recognition rate: 38.0%

23

SVM 보다 Nearest Class Mean Classifier 가 우수함 .

Exp2. 독일어로 학습 일본어 테스트 놀람

24

슬픔

기쁨독일어와 일본어의 차이로 인해 인식이 불안정함 .

Exp3. 일본어로 학습 일본어로 테스트

25

'neutral

'anger’

'happy’

'freight’

'sad'

DB 구성 : 5 개 감정 , 57 개 음성클립 ( 언덕 위의 구름 4 화 )

인식결과 : Nearest Class Mean Classifier 이용

26

56.7%

인식결과 : SVM 이용

27

86.6%SVM 인식 기법이 더 우수함 .

결론 및 향후 할일 결론

MFCC 특징량 추출 및 인식기 (SVM, Nearest mean class classifier) 개발

독일어 7 종 감정 인식 성능은 최대 47% 임 . 독일어 학습 일본어 감정 인식 성능은 매우 안좋음 . 일본어 학습 일본어 감정 인식 성능은 86.6% 임 .

향후 할일 ‘ 언덕 위의 구름’ 에 적합한 감정 종류 재선별 보다 많은 DB 확보 및 실험 ‘ 언덕 위의 구름’ 에 대한 전체적인 감정 통계 도출 및 분석

28

얼굴 검출 기법과 감성 언어 인식기법

Education