seoul | oct.7, 2016 nvidia deep learning contest 2016 … · 2016-10-18 · model a: train80 +...
TRANSCRIPT
SEOUL | Oct.7, 2016
Keun-dong Lee, DaUn Jeong, Seungjae Lee, Hyung Kwan Son
(ETRI VisualBrowsing Team)
Oct.7, 2016
NVIDIA DEEP LEARNING CONTEST 2016 : 딥러닝을 이용한 음식 영상 분류
2
AGENDA
1. Contest Outline
2. Dataset & Difficulties
3. Training
4. Test & Results
5. Analysis
3
CONTEST
• Task : Food Image Classification
• Measure : Top-1 Accuracy
지정주제 : 딥러닝을 통한 이미지 분류
보쌈 French Toast Pizza
4
DATASET
• Subset of Food-101 dataset (ETH) + Korean Food
Food-101 : https://www.vision.ee.ethz.ch/datasets_extra/food-101
• 120 Classes (No background class)
• Training set : 104,089 images
• Test set : 11,565 images
5
DIFFICULTIES
Baby back ribs French fries Lobster roll sandwich
The potato is everywhere! Fine-grained
Hot dog
None-food
Gyoza Dumplings Steak Filet mignon
Intra-class variations
비빔밥 된장찌개
6
TRAINING OUTLINE
• Training / Validation set random split (80%:20%) using DIGITS
• No outside data
• Hardware : Miruware DIGITS devbox (4 x Titan X 12GB)
• Software : DIGITS, Torch
• Pre-trained CNN model : Residual Net 200 layers for Imagenet Classification
• 제공된 Dataset으로 Fine-Tuning
• Two types of Random data augmentation : AUG_A, AUG_B
7
TRAINING DATA AUGMENTATION
• Data Augmentation
◦ Deep Neural Net Parameter 수에 비해 Training Image 부족 (VGG-16 : 138M parameters)
◦ Random data augmentation 을 통한 Training data 확장 효과
◦ 가장 단순한 Augmentation : Horizontal Flip
◦ 사용한 Augmentation : Random Scale /Random Crop /Aspect Ratio/ Horizontal Flip / Color
Training Image를 다양하게 관찰하는 방법
8
TRAINING PROCEDURE
Model A: Train80 + AUG_A Training / Val20 Test
◦ learning rate 0.001 / 30 epoch마다 1/10 drop
◦ 39 epoch에서 훈련 중단 (당초 deadline 9/2)
Model B: Train100 + AUG_A Training (Model A로 초기화)
◦ learning rate 0.0001 / 7 epoch에서 훈련 중단 (Original Plan : 90 epoch)
◦ (Post submission) 78 epoch까지 추가 훈련
Model C: Train100 + AUG_B Training (Model A로 초기화)
◦ learning rate 0.001 / 79 epoch까지 훈련 (deadline 연장 후 추가, 소요시간 : 1.5~2일)
9
TEST OUTLINE
•Various Evaluation methods
• Multi-crop evaluation
• Multi-scale evaluation
• Dense evaluation : Fully-convolutional Network
•Model Ensemble
10
TEST EVALUATION
• Multi-crop evaluation
◦ Fully-connected layer가 존재하는 경우 입력 이미지 해상도 224x224로 고정
• Multi-scale evaluation
• Dense evaluation : Fully-convolutional Network
Test Image를 다양하게 관찰하는 방법
11
TEST EVALUATION Multi-crop & Multi-scale evaluation
Center Crop Multi Crop
Multi-crop evaluation Multi-scale evaluation
• 성능향상 but 연산량 증가
12
RESULTS Multi-crop & Multi-scale evaluation
MODEL 비고 CROP SCALE TOP-1 ACCURACY
A Train 80 Multi Single 86.2 %
B (7 epoch) Train 100 Multi Single 88.1 %
B (78 epoch)* Train 100
Post submission Multi
Single 88.8 %
Multi 89.7 %
Test set 결과
13
• Fully convolutional network
◦ “For best results, we adopt the fully-convolutional form..” [Deep Residual Learning for Image Recognition, Kaiming He et al. ]
◦ fully-connected layer 를 convolutional layer로 변환 Fully Conv Net
◦ Input: 임의의 해상도 영상 / Output: Classification score map
TEST EVALUATION Dense evaluation (1)
Typical CNN Fully Conv Net Convert
14
• Fully convolutional network (cont’d)
◦ Dense Evaluation: Forwarding Network one time Classification score map
https://github.com/BVLC/caffe/blob/master/examples/net_surgery.ipynb
http://cs231n.github.io/convolutional-networks/
TEST EVALUATION Dense evaluation (2)
15
RESULTS Dense Evaluation
MODEL 비고 SCALE TOP-1 ACCURACY
B (7 epoch) Train 100
Multi Crop Single 88.1 %
Dense
Single 88.1 %
Multi 89.2 %
B (78 epoch)* Train 100
Post submission
Multi Crop
Single 88.8 %
Multi 89.7 %
Dense
Single 88.98 %
Multi 89.99 %
Test set 결과
16
• Final Evaluation
◦ {Multi-scale & Multi-crop} + {Multi-scale & Dense}
RESULT Final Evaluation
MODEL 비고 SCALE TOP-1 ACCURACY
B (78 epoch)* Train 100
Post submission
Multi Crop
Single 88.8 %
Multi 89.7 %
Dense
Single 88.98 %
Multi 89.99 %
Final Multi 90.05 %
Test set 결과
17
• Model B + Model C
MODEL ENSEMBLE
MODEL 비고 EVAL SCALE TOP-1 ACCURACY
B (7 epoch) Train 100, AUG_A Final Multi 89.3 %
B (78 epoch)* Train 100, AUG_A
Post submission Final Multi 90.05 %
C Train 100, AUG_B Final Multi 89.8 %
B (7 epoch) + C 최종 순위 1위 Final Multi 89.9 %
B (78 epoch) + C Post submission Final Multi 90.14 %
Test set 결과
18
(참고) IMAGENET RESULT ILSVRC 2016 Classification 세계 5위 (국내 참가자 중 1위)
Localization 세계 5위 (국내 참가자 중 1위)
Classification (4개 모델 앙상블) Localization
19
• Per-class results
ANALYSIS
0 20 40 60 80 100 120 1400.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
Class
Top-1 Accuracy
0.5 0.6 0.7 0.8 0.9 1.00
10
20
30
40
50
60
70
전체
Cla
ss 중
비율
(%
)
Top 1 Accuracy
Histogram
20
EASY CATEGORIES
edamame 98.3% 간장게장 100%
Oysters 98.3%
냉면 100% (비냉, 물냉 한 class)
까르보나라 98.2% 파전 99.1%
21
HARD CATEGORIES
Steak 56.8% Filet mignon 73.4%
gt: steak , pred: filet_mignon
Apple pie 76.8 % gt: cheesecake , pred: strawberry_shortcake
Cheese cake 75.7% Chocolate cake 77.7% Chocolate mousse 77.4%
22
HARD CATEGORIES
Steak 56.8% Filet mignon
gt: steak , pred: filet_mignon
Hard Category Confusing categories
Inter-class similarities
Baby back ribs Pork chop Prime rib
23
• Key factors to improve classification accuracy
◦ Random data augmentation in training
◦ Multi-scales & Multi-crops & Dense Evaluations
◦ Model Ensemble
• Lessons
◦ Food categories are fine-grained
◦ Intra-class variations & Inter-class similarities
CONCLUSION
Prime rib
24
APPLICATIONS
Google IM2Calories Google Photo Naver 맛집뷰
SEOUL | Oct.7, 2016
THANK YOU