얼굴 검출 기법과 감성 언어 인식기법
TRANSCRIPT
Brown Bag Seminar
Part I: 얼굴 검출 기법Part II: 감성 언어 인식 기법
2011. 3. 11( 금 ).
김성호
영남대학교 전자공학과
Part I: 얼굴 검출 기법 연구 [IPIU 2011 학회 발표 ]
Motivation
2
Proposed Object Representation Scheme
3
Viewpoint Figure/Ground mask Local appearance
For 2D object: (object center, scale)For 3D object: 3D object pose
Boundary shapeFigure/ground information
Appearance codebookPart pose
Joint appearance and shape model
4
Visual Context in the Joint Appearance & Shape Model
How to integrate those contextual cues?
Weak neighbor support
Strong neighbor support
Cooperative
BU+TD
Spatial Context Hierarchical Context
Part – Part context(bottom-up)
Object - Backgroundcontext (top-down)
Part – Whole context(bottom-up/top-down)
Grouping property
Supporting contextually related category
Predicting figure-ground
5
Utilize graphical model especially Directed graphical model (Bayesian Net)
Mathematical Formulation for Categorization (1/2)
( , , )H C V M Solution: Category label, Viewpoint, Mask: input feature
: example-based model
G
D
Key issue: difficult modeling of priordue to complex high dimensions
Our approach
V
M
F
A X
{C,B}
N
Top-downBottom-up
NN
Viewpoint
Figure-ground
Codebook index
b2f4
f5b4 b5
b6b3f3b1f1f2
V
M
F
G
{C,B}
appearance
pose
6
Learning for Distributed Category Representation
CC: Category specific Codebook for top-down inference
UC: Universal Codebook for bottom-up inference
… … …
……
…
Joint appearance and boundary with viewpoint
Car Airplane
Issue How to select optimal codebook (CB) for category representation?
Previous constellation model: fixed no. of parts Cannot handle large variationsWhy distributed? To handle large intra class variations
7
Codebook Selection Reducing Surface Markings
Focus What codebook can reduce the effect of surface markings?
Our strategy Intermediate blurring
Statistical property Entropy
Repeatable partSurface marking part
8
Low entropy surface marking
High entropy Semantic parts
Entropy of Candidate Codebook
Finding:High entropy codebook in should be selected for surface marking reduction
2( | ) ( | ) log ( | )
: Set of label of object instances
: Candidate of codebook
l L
H L F p l F p l F
L
F
Part-whole context
Part-part context(estimate weight)
Inference Flow related to Category Model
InputDense feature
Matching to UC
Grouping (similarity & proximity)
…
…
…
Car Airplane…
+
background CB
UCB
CCB
Car category
Multi-modal viewpoint
Multi-modal figure-ground
mask
Final result
Category Model
9
Demo of Categorization and Segmentation
10
11
Category Detection: Caltech Face Dataset [DB1]
About face DB 435 face images with clutter 468 background images
Learning Randomly select 15 faces Randomly select 15 background
Test 200 novel face images 200 novel background
[DB1] http://www.robots.ox.ac.uk/~vgg/data3.html[Weber00] M. Weber, M. Welling, and P. Perona, “Unsupervised learning of models for recognition”, In Proc. ECCV, pp. 18–32, 2000.[Fergus03] R. Fergus, P. Perona, A. Zisserman, “Object class recognition by unsupervised scale invariant learning”, In CVPR, 2003.[Shotton05] J. Shotton, A. Blake, R. Cipolla, “Contour-based learning for object detection”, In ICCV, 2005.
Method
NtrainROC EER
(Region error<25%)Unsegmented Segmented
[Weber00] 200 0 94.0%
[Fergus03] 220 0 96.4%
[Shotton05] 50 10 96.5%
Ours 0 15 97.3%
12
Examples of Face Detection
13
Test image Bottom-up viewpoints Bottom-up mask
Hypothesized viewpoint Hypothesized mask Final Inference resultby Boosted MCMC
14
Test Results in Real Scene (KAIST)
Note: We use Caltech DB and test real images.
Conclusions and Discussions
Joint appearance and boundary with viewpoint is suitable object model for the object categorization in cluttered scenes.
Visual contexts (part-part, part-whole, object-background context) can discriminate ambiguous figure-ground.
Bayesian Net can model both the categorization and the figure-ground segmentation.
Boosted MCMC can provide efficient inference for cluttered objects.
Future work Modeling of more flexible figure-ground mask Using boundary shape in likelihood calculation
15
Part II: 감성언어 인식 기법 연구 - Introduction
Speech A sequence of elementary acoustic symbols
Information in speech Gender information, age, accent, speaker’s identity, health, and
emotion
Emotional speech recognition Recently, increased attention in this area 융합과제 : 반한 감정에 대한 정량적 분석에 도움 .
16
Structure of Emotional Speech Recognition
핵심 Feature extractor Classifier
17
Recognized emotions
MFCC SVM orNearest class mean classifier
Feature for Emotional Speech Recognition
Mel Frequency Cepstral Coefficients (MFCC) Convey information of short time energy in frequency domain
18
Signal
Fourier transform (frequency domain)
Mapping the power spectrum onto the mel scale
Take Log of the mel frequency
Final MFCC: Amplitude of resulting spectrum
Mel scale: 사람이 차이를 느끼는 주파수 간격
Classifier: Support Vector Machine
19
Feature space Learning: Finding optimal classifier
Recognition: Performed by the learned classifier
Classifier: Nearest Class Mean
20
Feature space
Learning: Finding class means
Recognition: Finding nearest class
Exp.1 on EMO Database
구성 7 종의 감정 데이터 (happy, angry, anxious, fearful, bored,
disgusted, neutral) 10 종의 문장 10 명의 성우 ( 남 5, 여 5) 언어 : 독일어
21
anger
happy
boredom
Recognition using Nearest Class Mean Classifier
Learning: 150 (randomly selected), test: 150
22
Recognition rate: 47.0%
Recognition using SVM
Recognition rate: 38.0%
23
SVM 보다 Nearest Class Mean Classifier 가 우수함 .
Exp2. 독일어로 학습 일본어 테스트 놀람
24
슬픔
기쁨독일어와 일본어의 차이로 인해 인식이 불안정함 .
Exp3. 일본어로 학습 일본어로 테스트
25
'neutral
'anger’
'happy’
'freight’
'sad'
DB 구성 : 5 개 감정 , 57 개 음성클립 ( 언덕 위의 구름 4 화 )
인식결과 : Nearest Class Mean Classifier 이용
26
56.7%
인식결과 : SVM 이용
27
86.6%SVM 인식 기법이 더 우수함 .
결론 및 향후 할일 결론
MFCC 특징량 추출 및 인식기 (SVM, Nearest mean class classifier) 개발
독일어 7 종 감정 인식 성능은 최대 47% 임 . 독일어 학습 일본어 감정 인식 성능은 매우 안좋음 . 일본어 학습 일본어 감정 인식 성능은 86.6% 임 .
향후 할일 ‘ 언덕 위의 구름’ 에 적합한 감정 종류 재선별 보다 많은 DB 확보 및 실험 ‘ 언덕 위의 구름’ 에 대한 전체적인 감정 통계 도출 및 분석
28