imagenet classification with deep convolutional neural networks alex krizhevsky, ilya sutskever,...

ImageNet Classification withDeep Convolutional Neural Networks

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, NIPS 2012

Eunsoo Oh( 오은수 )

2

ILSVRC

● ImageNet Large Scale Visual Recognition Challenge

● An image classification challenge with 1,000 categories (1.2 million images)

reference : http://www.image-net.org/challenges/LSVRC/2013/slides/ILSVRC2013_12_7_13_clsloc.pdf

Processing…

DeepConvolutional

Neural Network(ILSVRC-2012 Winner)

3

Why Deep Learning?

● “Shallow” vs. “deep” architectures

reference : http://web.engr.illinois.edu/~slazebni/spring14/lec24_cnn.pdf

Learn a feature hierarchy all the way from pixels to classifier

4

Background

● A neuron

x1

x2

x3

xd

…

f

Input(raw pixel)

w1

w2

w3

wd

Weights

Output: f(w·x+b)

reference : http://en.wikipedia.org/wiki/Sigmoid_function#mediaviewer/File:Gjl-t(x).svg

5

Background

● Multi-Layer Neural Networks

● Nonlinear classifier

● Learning can be done

by gradient descent

Back-Propagation

algorithm

InputLayer

HiddenLayer

OutputLayer

6

Background

● Convolutional Neural Networks● Variation of multi-layer neural networks

● Kernel (Convolution Matrix)

reference : http://en.wikipedia.org/wiki/Kernel_(image_processing)

7

Background

● Convolutional Filter

reference : http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/fergus_dl_tutorial_final.pptx

InputFeature Map

.

.

.

8

Proposed Method

● Deep Convolutional Neural Network● 5 convolutional and 3 fully connected layers

● 650,000 neurons, 60 million parameters

● Some techniques for boosting up performance

● ReLU nonlinearity

● Training on Multiple GPUs

● Overlapping max pooling

● Data Augmentation

● Dropout

9

Rectified Linear Units (ReLU)

reference : http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf

10

Training on Multiple GPUs

● Spread across two GPUs● GTX 580 GPU with 3GB memory

● Particularly well-suited to cross-GPU parallelization

● Very efficient implementation of CNN on GPUs

11

Pooling

● Spatial Pooling● Non-overlapping / overlapping regions● Sum or max

reference : http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/fergus_dl_tutorial_final.pptx

Max

Sum

12

Data Augmentation

256x256

224x224

224x224

224x224

224x224

224x224

224x224

Horizontal Flip

Training Image

Training Images

Enlarge the dataset!

13

Dropout

● Independently set each hidden unit activity to zero with 0.5 probability

● Used in the two globally-connected hidden layers at the net's output

reference : http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf

14

Overall Architecture

● Trained with stochastic gradient descent on two NVIDIA GPUs for about a week (5~6 days)

● 650,000 neurons, 60 million parameters, 630 million connections

● The last layer contains 1,000 neurons which produces a distribution over the 1,000 class labels.

15

Results

● ILSVRC-2010 test set

ILSVRC-2010 winner

Previous bestpublished result

Proposed Method

16

Results

● ILSVRC-2012 results

Proposed methodTop-5 error rate : 16.422%

Runner-upTop-5 error rate : 26.172%

reference : http://image-net.org/challenges/LSVRC/2012/ilsvrc2012.pdf

17

Qualitative Evaluations

18

Qualitative Evaluations

19

ILSVRC-2013 Classification

reference : http://www.image-net.org/challenges/LSVRC/2013/slides/ILSVRC2013_12_7_13_clsloc.pdf

20

ILSVRC-2014 Classification

22 Layers 19 Layers

21

Conclusion

● Large, deep convolutional neural networks for large scale image classification was proposed

● 5 convolutional layers, 3 fully-connected layers

● 650,000 neurons, 60 million parameters

● Several techniques for boosting up performance

● Several techniques for reducing overfitting

● The proposed method won the ILSVRC-2012● Achieved a winning top-5 error rate of 15.3%,

compared to 26.2% achieved by the second-best entry

22

Q & A

???

23

Quiz

● 1. The proposed method used hand-designed features, thus there is no need to learn features and feature hierarchies. (True / False)

● 2. Which technique was not used in this paper?

① Dropout

② Rectified Linear Units nonlinearity

③ Training on multiple GPUs

④ Local contrast normalization

24

AppendixFeature Visualization

● 96 learned low-level(1st layer) filters

25

AppendixVisualizing CNN

reference : M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional neural networks. arXiv preprint arXiv:1311.2901, 2013.

26

AppendixLocal Response Normalization

● : the activity of a neuron computed by applyuing kernel i at position (x, y)

● The response-normalized activity is given by

● N : the total # of kernels in the layer

● n : hyper-parameter, n=5

● k : hyper-parameter, k=2

● α : hyper-parameter, α=10^(-4)

● This aids generalization even though ReLU don’t require it.

● This reduces top-5 error rate by 1.2%

27

AppendixAnother Data Augmentation

● Alter the intensities of the RGB channels in training images

● Perform PCA on the set of RGB pixel values

● To each training image, add multiples of the found principal components

● To each RGB image pixel

add the following quantity

● , : i-th eigenvector and eigenvalue

● : random variable drawn from a Gaussian with mean 0 and standard deviation 0.1

● This reduces top-1 error rate by over 1%

28

AppendixDetails of Learning

● Use stochastic gradient descent with a batch size of 128 examples, momentum of 0.9, and weigh decay of 0.0005

● The update rule for weight w was

● i : the iteration index

● : the learning rate, initialized at 0.01 and reduced three times prior to termination

● : the average over the i-th batch Di of the

derivative of the objective with respect to w

● Train for 90 cycles through the training set of 1.2 million images

imagenet classification with deep convolutional neural networks alex krizhevsky, ilya sutskever,...

Documents