computer vision lab seminar(deep learning) yong hoon

112
1

Upload: yonghoon-kwon

Post on 21-Apr-2017

2.170 views

Category:

Engineering


3 download

TRANSCRIPT

Page 1: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

1

Deep LearningBasic Theory and other Application

권용훈

Page 2: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

2

Table of Contents

1. Representation Learning

2. Background

3. Concepts and Principles

4. Applications

Page 3: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

3

Page 4: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

4

Page 5: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

5

• upheaval in pattern recognition due to deep learning

Trends in pattern recognition

Page 6: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

6

1. Representation Learning

Page 7: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

7

Representation Learning

Computer understand information by itself

Car

Page 8: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

8

Representation Learning

Computer understand information by itself

Car

HOW?

Page 9: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

9

Representation Learning

H Hidden variable

V Visible variable Observable in the real world

Non-Observable in the real world

Page 10: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

10

Representation LearningAll

Everything that can be expressed by hidden variable

Page 11: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

11

Representation Learning

V Visible variable

All

H Hidden variable

Page 12: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

12

Representation LearningAll

HV

Page 13: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

13

Representation LearningAll

HV

HVV

Page 14: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

14

Representation LearningAll

HV

HVV

...

Page 15: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

15

Representation Learning

Connect a number of Visible variables(V) and Hidden variable(H)

H VV

V VV

……

Page 16: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

16

Representation Learning

Connecting the structural relationships that make something

H VV

V VV

……

Page 17: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

17

Representation Learning

V -> H -> X(something)• Expression• Summary• Encoding• Abstraction

V V V V V V V V V V

H H H H H

X

Page 18: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

18

Representation Learning

One hidden variable is connected all of v

V V V V V V V V V V

H H H H H

X

v v vv v v v v v

h

X

Page 19: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

19

Representation Learning

Single layer

X

h h h h h

v v vv v v v v v

Page 20: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

20

Representation Learning

Multi layer

X

v v vv v v v v v

h h h h h

h h h

……

Page 21: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

21

Representation Learning

Intuitive Interpretation of Multi Layer

X

v v vv v v v v v

h h h h h

h h h

Page 22: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

22

Representation Learning

Intuitive Interpretation of Multi Layer

X

v v vv v v v v v

h h h h h

h h h

abstraction

abstraction

Page 23: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

23

Representation Learning

Intuitive Interpretation of Multi Layer

X

v v vv v v v v v

h h h h h

h h h

abstraction

abstraction

Page 24: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

24

2. Background

Page 25: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

25

Neural networks history• Deep learning is all about deep neural networks

• 1949 : Hebbian learning• Donald Hebb : the father of neural networks

• 1958 : (single layer) Perceptron• Frank Rosenblatt

- Marvin Minsky, 1969

• 1986 : Multilayer Perceptron(Back propagation)• David Rumelhart, Geoffrey Hinton, and Ronald Williams

• 2006 : Deep Neural Networks• Geoffrey Hinton and Ruslan Salakhutdinov

Page 26: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

26

Why neural networks?

• Weakness in kernel machine(SVM …):• It does not scale well with sample size.• Based on matching local templates.

• the training data is referenced for test data• Local representation VS distributed representation

• N N(Neural Network) -> Kernel machine -> Deep NN

Page 27: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

27

Artificial neural network(ANNs or NN)

Neuron and synapse in brain and ANN Neural networks

• ANNs are computational models inspired by brain• Processing units(nodes vs. neurons)• Connections(weights vs. synapses)

Page 28: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

28

Artificial Neural Network(ANN)𝑥1

𝑥2𝑥3

𝑥𝑛

𝑤1𝑤2𝑤3

𝑤𝑛

Input Output

bias

Page 29: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

29

Artificial Neural Network(ANN)𝑥1

𝑥2𝑥3

𝑥𝑛

𝑤1𝑤2

𝑤3

𝑤𝑛

……

𝑧=∑𝑖=1

𝑛

𝑤 𝑖𝑥 𝑖+𝑏 ; 𝑦=𝐻 (𝑧 ) 𝑦Input Output

Activation functionb

Page 30: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

30

Deep Neural Network

Input …

……

Output

Page 31: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

31

Deep Neural Network

Input …

……

Output

Page 32: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

32

Deep Neural Network

Input …

……

Output

Page 33: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

33

Training Deep Neural Network

Iteratively update W along error gradient -> gradient descent

Input …

…… Output

X y

tTarget

Given training set {()}, Find W that minimizes

𝑤11(1)

𝑤12(1)

𝑤1𝑛(1)

𝑤11(2)

𝑤𝑖𝑗(𝑘)

Page 34: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

34

Gradient descent

[http://darkpgmr.tistory.com/133]

gradient ascent <-> gradient descentFind local optimum(global optimum-x)

Page 35: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

35

Backpropagation

Input …

…… Output

X y

tTarget

• Using chain rule, propagate error derivatives Backwards to compute each nodes contribution to error, • Compute error derivative of each weight using

𝑤11(1)

𝑤12(1)

𝑤1𝑛(1)

𝑤11(2)

𝑤𝑖𝑗(𝑘)

𝛿❑𝐼

𝛿❑( 𝐼−1 )

1

𝛿❑𝑘𝑖=(∑𝑤 𝑖𝑗

𝑘𝛿 𝑗𝑘+1)𝛿′

Page 36: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

36

3. Concepts and Principles

Page 37: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

37

shallow learning

Paradigm shift on pattern recognitionShallow learning Deep learning

feature extraction by domain experts(SIFT, SURF, orb...)

automatic feature extraction from data

separate modules(feature extractor + trainable classifier)

unified model : end-to-end learning(trainable feature + trainable classifier)

deep learning

Page 40: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

40

• The human brain has at least 5 to 10 layers for visual processing• “Hierarchical model” is necessary for human-level intelligence

Why deep?

Page 41: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

41

What good comes from “deep”?“Deep”means more layers

• The representation gets more hierarchical and abstract.• It increases the model complexity, which leads to higher accuracy.

𝑥1𝑥2

shallow net-works

𝑦 1𝑦 2

𝑤1 𝑤2

Page 42: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

42

What good comes from “deep”?“Deep”means more layers

• The representation gets more hierarchical and abstract.• It increases the model complexity, which leads to higher accuracy.

𝑥1𝑥2

deep networks

𝑦 1𝑦 2

𝑤(1) 𝑤(2) 𝑤(3) 𝑤(4 )

h(1) h(2) h(3)

Page 43: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

43

Pre-training• Backpropagation may not work well with deep network

• vanishing gradient problem• lower layers may not learn much about the task.

vanishing gradient

Backward error information vanishing

good initialization is crucial

Pre-training

Page 44: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

44

• Neural network has been around since 60’s, but...• Deep NN was difficult to train, due to

• Lack of dataset large enough to train it• Lack of computing power• Lack of efficient training algorithms & techniques

• Now we have all of the above• Readily available large-scale dataset• GPU, multicore/cluster systems• DBN [Hinton 06], ReLU(Rectified linear unit), dropout, …

• Still, more thorough theoretical analysis needed to understand why it works well (or not)

Deep Learning : So Now Why?

Page 45: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

45

Deep belief networks(DBNs)

Generative fine tuning discriminative fine tuning

• probabilistic generative model• supervised fine-tuning

• generative: up-down algorithm• discriminative: backpropagation

Page 46: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

46

Updating Weights• How much to update?

• Learning rate()• = • Fixed or adaptive• Common recipe : reduce learning rate when validation

error stops decreasing

Error Learning rates reduced

Epoch

Page 47: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

47

Updating Weights• How much to update?

• Learning rate()• = • Fixed or adaptive• Common recipe : reduce learning rate when validation

error stops decreasing• Momentum(v)

• Forces GD to keep moving in previous direction

number

Page 48: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

48

Updating Weights

• How much to update?• After every training sample(online learning)• After iterating over entire training set(full-batch)• After some training samples(mini-batch)

• Stochastic gradient descent• Faster convergence than full-batch• Efficient computation on GPUs

Page 49: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

49

Regularization

• Ways to avoid Overfitting• Weight decay• Weight sharing(CNN)• Early stopping• Model averaging(various model)• Dropout(more on this later)• Pre-training(good initialization)• Adding noise to training data

Page 50: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

50

Dropout

• Consider a neural net with one hidden layer• Each time we present a training example, randomly omit each hidden unit with probability 0.5• Randomly sampling from different architectures. All architectures share weights

An Efficient way to average many large neural nets.

Random value > 0.5 Random value < 0.5

Page 51: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

51

Other Training Details

• Choice of nonlinear function• Logistic function• Tanh• ReLU(Rectified linear unit)

• F(x) = max(0, x)• Non-saturating• Faster convergence [Nair 10]

Both suffer from saturation problem(slow convergence due to near-0 gradient)

Page 52: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

52

Other Training Details

• Softmax and cross-entropy[Ref.]• Normally used instead of squared error loss• Appropriate for representing probability distribution•

• Input preprocessing(pre –processing)• Zero-mean, unit-variance input data yields better shaped error surface

Page 53: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

53

𝑥𝑡

h𝑡

¿

𝑥0

h0

𝑥2

h2

𝑥1

h1

𝑥𝑡

h𝑡

[http://karpathy.github.io/2015/05/21/rnn-effectiveness]

Recurrent Neural Network(RNN)

Page 54: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

54

• Bidirection Neural Network utilize in the past and future context for every point in the sequence

• Two Hidden Layer(Forwards and Backwards) shared same output layer

Visualized of the amount of input information for prediction by different network structures

[Schuster 97]

Bidirection Recurrent Neural network(BRNN)

Page 55: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

55

Long short-term memory• Long short-term memory (LSTM) works successfully with sequential data.

• hand writing and speech, etc..• LSTM can model very long term sequential patterns.

• Longer memory has a stabilizing effect.

A node itself is a deep network.

Page 56: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

56

RNN LSTM

• RNN forget the previous input(vanishing gradient)

• LSTM remember previous data and reminder if it wants

Problem of RNN

Page 57: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

57

h𝑡−1(𝑝 𝑟𝑒𝑣𝑟𝑒𝑠𝑢𝑙𝑡 )

𝜎

𝑥𝑡 (𝑐 𝑢𝑟𝑟𝑒𝑛𝑡𝑑𝑎𝑡𝑎)

𝐶𝑡 −1 𝐶𝑡

𝑓 𝑡=𝜎 (𝑊 𝑓 ∙ [h𝑡− 1 , 𝑥𝑡 ]+𝑏 𝑓 )

𝑓 𝑡

[http://colah.github.io/posts/2015-08-Understanding-LSTMs]

Step-by-Step LSTM Walk

Page 58: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

58

h𝑡−1(𝑝 𝑟𝑒𝑣𝑟𝑒𝑠𝑢𝑙𝑡 )

𝜎

𝑥𝑡 (𝑐 𝑢𝑟𝑟𝑒𝑛𝑡𝑑𝑎𝑡𝑎)

𝐶𝑡 −1 𝐶𝑡

𝑖𝑡=𝜎 (𝑊 𝑖∙ [h𝑡−1 ,𝑥𝑡 ]+𝑏𝑖)

𝜎

𝑓 𝑡 𝑖𝑡h𝑡𝑎𝑛

~𝐶𝑡

~𝐶𝑡= h𝑡𝑎𝑛 (𝑊 𝑐 ∙ [h𝑡 −1 ,𝑥𝑡 ]+𝑏𝑐)[http://colah.github.io/posts/2015-08-Understanding-LSTMs]

Step-by-Step LSTM Walk

Page 59: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

h𝑡−1(𝑝 𝑟𝑒𝑣𝑟𝑒𝑠𝑢𝑙𝑡 )

𝜎

𝑥𝑡 (𝑐 𝑢𝑟𝑟𝑒𝑛𝑡𝑑𝑎𝑡𝑎)

𝐶𝑡 −1 𝐶𝑡

𝐶𝑡= 𝑓 𝑡∗𝐶𝑡− 1+ 𝑖𝑡∗~𝐶𝑡

𝜎

𝑓 𝑡 𝑖𝑡h𝑡𝑎𝑛

~𝐶𝑡

+ⅹ

59[http://colah.github.io/posts/2015-08-Understanding-LSTMs]

Step-by-Step LSTM Walk

Page 60: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

60

h𝑡−1(𝑝 𝑟𝑒𝑣𝑟𝑒𝑠𝑢𝑙𝑡 )

𝜎

𝑥𝑡 (𝑐 𝑢𝑟𝑟𝑒𝑛𝑡𝑑𝑎𝑡𝑎)

𝐶𝑡 −1 𝐶𝑡

𝑂𝑡=𝜎 (𝑊 𝑜 ∙ [h𝑡 −1 ,𝑥𝑡 ]+𝑏𝑜)

𝜎

𝑓 𝑡 𝑖𝑡h𝑡𝑎𝑛

~𝐶𝑡

+ⅹ

𝜎ⅹ

h𝑡𝑎𝑛

h𝑡

h𝑡

h𝑡=𝑂𝑡∗𝑡𝑎nh(𝐶𝑡 )[http://colah.github.io/posts/2015-08-Understanding-LSTMs]

Step-by-Step LSTM Walk

Page 61: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

61

LSTM Regularization with Dropout

• Dropout operator only to non-recurrent connections

[Zaremba14]

Arrow dash applied dropout otherwise solid line is not applied

: hidden state in layer in timestep

dropout operator

Frame-level speech recognition accuracy

Page 62: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

62

decode

encode

V1

W1

X2

X1

X1

V1

W1

X2

X1

X1

X2

V2

W2

X3

• Regress from observation to itself (input X1 -> output X1)• ex : data compression(JPEG etc..)

[Lemme 10]

Auto Encoder

output

hidden

input

Page 63: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

63

0 1 0 0…

0.05 0.7 0.5 0.01…

0.9 0.1 10− 8…10− 4

cow dog cat bus

original target

output of ensemble

[Hinton 14]

Softened outputs reveal the dark knowledge in the ensemble

dog

dog

training result

cat buscow

dog cat buscow

Dark knowledge

Page 64: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

• Distribution of the top layer has more information.• Model size in DNN can increase up to tens of GB

input

target

input

output

Training a DNN

Training a shallow network

64

Dark knowledge

[Hinton 14]

Page 65: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

65

0 1 0 0 0 0 0 0 0 0dog

Word EmbeddingLanguage Understanding(semantic)

0 0 1 0 0 0 0 0 0 0 cat

• Word embedding function mapping to high-dimensional vectors

0.3 0.2 0.1 0.5 0.7dog

0.2 0.8 0.3 0.1 0.9cat

one hot vector representation

[Vinyals 14]Nearest neighbors a few words

Word Embedding

Page 66: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

66

: time sequence : gain : bias : weight value of the between neuron and : external input for neuron : non-linear function() : rate of change activation post synaptic neuron

Input NodesHidden Nodes

Output Nodes(subset of hidden nodes)

𝜏 𝑖( 𝑑𝑦𝑖𝑑𝑡 )=− 𝑦 𝑖+∑𝑊 𝑗𝑖𝜎 (𝑔 𝑗 ( 𝑦 𝑗−𝑏 𝑗 ) )+𝐼 𝑖

Update Equation

Continuous-Time RNN(CTRNN)

• Dynamic system model of biological neural network(walk, bike, etc..)

• Ordinary differential equations to model the effects on a neuron of the training(using Generic Algorithm)

Page 67: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

67

4. Applications

Page 68: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

68

Convolutional Neural Network(CNN)

• Handwritten digit recognition [LeCun 98]• (Convolution-Subsampling) N + (Full connection) M

• Neural network that makes use of prior knowledge about im-ages

Features extraction Classification

Page 69: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

69

Convolutional Neural Network(CNN)

• Incorporate prior knowledge about images• Locality : each pixel is only related to small neighborhood of pixels -> local connectivity

Page 70: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

70

Convolutional Neural Network(CNN)

• Incorporate prior knowledge about images• Locality : each pixel is only related to small neighborhood of pixels -> local connectivity• Stationarity : image statistics are invariant over all image locations -> Shared Weights

Page 71: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

71

Convolutional Neural Network(CNN)

• Convolution kernels with learned parameters• Learn multiple kernels(filter)• Still much fewer parameters than fully connected model

Page 72: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

72

Convolutional Neural Network(CNN)

• Subsampling(pooling)• NxN -> 1• Max pooling, Average pooling• Invariance to small translation• Larger receptive fields in upper layers

Page 73: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

73

Convolutional Neural Network(CNN)

• Backpropagation• Convolution layer

• dE/dW : Error summed and propagated from all nodes in which current weight W occurs

• Pooling layer• Max pooling : error propagated back to max node only• Average pooling : error uniformly propagated back to all pooled nodes

Page 74: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

74

Application : Image Classification

• ImageNet Large-Scale Visual Recognition Challenge (ILSVRC, 2010~)

• Image classification / localization• 1,200,000(1.2M) labeled images, 1000 classes• 2012 : CNN won the contest by large margin• CNN has been dominating the contest since..

• 2012 : 15.3% (top-5 error), 2nd(26.2%)• 2013 : 11.2%• 2014 : 6.7%

Page 76: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

76

Super Vision Team

Geoffrey Hinton (right) Alex Krizhevsky, and Ilya Sutskever (left)

Page 77: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

77

[Krizhevsky 12]• Deep : 5conv. Layers + 3 fully connected• Trained using 2GPUs• Top-5 error : 15.3 % vs 26.2%(2nd place, non-CNN)

ImageNet Challenge 2012

Page 78: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

78

[Krizhevsky 12]• ReLU • Overfitting prevention

• Data augmentation• Random translation, horizontal flip• Color perturbation

• Dropout• Randomly sets node activation to 0• Has an effect of simultaneously learning multiple architectures• Reduces co-adaptation between neurons[Hinton 12]

ImageNet Challenge 2012

Page 79: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

79

[Zeiler 13] : winning submission by clarifai• (Training details not revealed : related publication)• Applied modifications to [krizhevsky 12] by visualizing features from each conv. layer

ImageNet Challenge 2013awesome performance!!

Page 84: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

84

[Howard 13]

ImageNet Challenge 2013

• Utilize entire input image instead of cropping out edges (as opposed to [krizhevsky 12])

[Sermanet 13]• Multi-scale training• Efficient computation of dense localization

Page 85: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

85

ImageNet Challenge 2014

[Lin 14]: “Network–in–network”• Replace convolution with multilayer perceptron• Nonlinear : better abstraction

Page 86: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

86

[Lin 14]: “Network–in–network”

ImageNet Challenge 2014

• Replace convolution with multilayer perceptron• Nonlinear : better abstraction• Can replace full connection with simple averaging

Page 87: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

87

ImageNet Challenge 2014

[Lin 14]: “Network–in–network”• Replace convolution with multilayer perceptron• Nonlinear : better abstraction• Can replace full connection with simple averaging

Page 88: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

88

ImageNet Challenge 2014

CNN NIN

[Lin 14]: “Network–in–network”• Replace convolution with multilayer perceptron• Nonlinear : better abstraction• Can replace full connection with simple averaging

Page 89: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

89

ImageNet Challenge 2014

[Lin 14]: “Network–in–network”• Replace convolution with multilayer perceptron• Nonlinear : better abstraction• Can replace full connection with simple averaging

Page 90: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

90

ImageNet Challenge 2014

[Lin 14]: “Network–in–network”• Replace convolution with multilayer perceptron• Nonlinear : better abstraction• Can replace full connection with simple averaging• Equivalent to 1x1 convolution

Page 91: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

91

ConvolutionPoolingSoftmaxOther

ImageNet Challenge 2014

[Szegedy 14] : “GoogLeNet”

Page 92: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

92

ImageNet Challenge 2014

[Szegedy 14] : “GoogLeNet”• 22-layer network trained on 16k CPU cores [Dean 12]• 9 “Inception” modules (multi-scale convolution)• Average pooling• Auxiliary classifiers• 12x fewer parameters than [krizhevsky 12]-60,000,000

Page 93: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

93

ImageNet Challenge 2014

[Szegedy 14] : “GoogLeNet”• Inception” modules (multi-scale convolution)

• Heterogeneous concatenation of multi-scale convolution• [Arora 14] “cluster correlated neurons together”• 1x1 convolution used for dimension reduction

Page 95: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

95

ImageNet Challenge 2014

[Wu 15] (Baidu)• Beats GoogLeNet: 6.67 % -> 5.98%• Custom-built supercomputer: 4GPUs x 36 nodes (Nvidia Tesla K40m)• Aggressive data augmentation• Multi-scale training with high-res image

Page 96: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

96

ImageNet Challenge 2014

[Wu 15] (Baidu)• Beats GoogLeNet: 6.67 % -> 5.98%• Custom-built supercomputer: 4GPUs x 36 nodes (Nvidia Tesla K40m)• Aggressive data augmentation• Multi-scale training with high-res image

Data augmentation

Page 97: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

97

ImageNet Challenge 2015

• ImageNet Challenge 2015 will be open (November 13, 2015) submission deadline• 2012 non-CNN : 26.2%(top-5 error)• 2012 AlexNet : 15.3%• 2013 Clarifai : 11.2%• 2014 GoogLeNet : 6.7%• (pre-2015): (Google) 4.9%

• Beyond human-level performance

[ImageNet Challenge]

Page 98: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

98

ImageNet Challenge

• Common recipes• Deep (many conv layers), ReLU, dropout• Random crop training (translation, horizontal flip)• Multi-scale or Random-scale training• Color perturbation• Multi-crop testing• Multi-model averaging

• Focus gradually moving away from classification to classification + localization

Page 99: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

99

Auto Caption

Auto Caption(Google)

Neural Talk(Stanford Univ.)http://cs.stanford.edu/people/karpathy/deepimagesent/

Page 100: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

100

• Text-image multimodal learning• Learn mapping between image and word space • Generate sentence describing image & find image matching given sentence

CNN(convolutional neural net) + RNN(recurrent neural net)

Auto CaptionShow and Tell : A Neural Image Caption Generator

[Vinyals 14]

describing true sentence

Page 101: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

101

[Karpathy 14]

[Girshick 13]

• Generate dense, free-from descriptions of images

Infer region word alignments use to R-CNN + BRNN + MRF

Image Segmentation(Graph Cut + Disjoint union)

Deep Visual-Semantic Alignments for Generating Image Descriptions(Stanford Univ. )Auto Caption

Page 102: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

102

[Karpathy 14]Infer region word alignments use to R-CNN + BRNN + MRF

Deep Visual-Semantic Alignments for Generating Image Descriptions(Stanford Univ. )Auto Caption

𝑆𝑘𝑙=∑𝑡 ∈𝑔𝑙

∑𝑖∈𝑔𝑘

𝑚𝑎𝑥(0 ,𝑣 𝑖𝑇 𝑆𝑡)Result BRNN

Result RNN

𝑔𝑙𝑔𝑘

• and with their additional Multiple Instance Learning

hⅹ4096 maxrix(h is 1000~1600)

t-dimensional word dictionary

Page 103: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

103

[Karpathy 14]

Deep Visual-Semantic Alignments for Generating Image Descriptions(Stanford Univ. )Auto Caption

Smoothing with an MRF

• Best region independently align each other• Similarity regions are arrangement nearby

• Argmin can found dynamic programming

(word, region)

Page 104: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

104

Auto Caption

• Generation Methods on Auto Caption1) Compose descriptors directly from recognized content2) Retrieve relevant existing text given recognized content

Related Works

• Compose descriptions given recognized content Yao et al. (2010), Yang et al. (2011), Li et al. ( 2011), Kulkarni et al. (2011)

• Generation as retrieval Farhadi et al. (2010), Ordonez et al (2011), Gupta et al (2012), Kuznetsova et al (2012)

• Generation using pre-associated relevant text Leong et al (2010), Aker and Gaizauskas (2010), Feng and Lapata (2010a)

• Other (image annotation, video description, etc) Barnard et al (2003), Pastra et al (2003), Gupta et al (2008), Gupta et al (2009), Feng and Lapata (2010b), del Pero et al (2011), Krishnamoorthy et al (2012), Barbu et al (2012),  Das et al (2013)

Page 106: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

106

Other Vision Applications• Sequence convert to sequence learning• Sequence representation 1000D -> PCA 2D

sensitive to word order[Sutskever 14]

Invariant to active voice and passive voice

Page 107: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

107

Other Vision Applications

• Data regularities are captured in multimodal vector space• possible in a multimodal representation (in an Euclidean

space)

[Kiros 14]

vec(QS rank) + vec(gist) = vec(world ranking 2)

Page 108: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

108

• Divided to five part of human body(two arms, two legs, trunk)• Modeling movements of these individual part and layer composed of 9

layers(BRNN, fusion layer, fully connection layer)

[Yong 15]

Hierarchical RNN for skeleton based Action Recognition

Page 109: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

109

Leading experts in deep learning

Page 110: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

110

Summary

• Deep architectures perform better than existing shallow ones because they learn hierarchical representation of data

• Now it’s possible to train deep neural networks thanks to the availability of:• Large-scale training data• High-performance computing devices• Newly developed training algorithms & techniques

• Common rules of thumb for improving performance of DNN:• Make it deeper & larger (ensuring that it does not overfitting)• Use ReLU for faster convergence & dropout as regularization• Apply various data augmentation schemes to increase effective amount of training data• Average predictions from multiple models & input crops

Page 111: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

111

Resourceshttp://deeplearning.net/

• “Learning deep architectures for AI” by Y. Bengio 2009• “Deep learning in neural networks : An overview” by J. Schmidhuber 2014• “Maching Learning to Deep Learning  by 곽동민

• DBN (Science paper's code) : Hinton (Matlab)• http://www.cs.toronto.edu/~hinton/MatlabForSciencePaper.html

• convolutional neural networks : LeCun• Alex Krizhevsky: Hinton (python, C++)

• https://code.google.com/p/cuda-convnet/• Caffe: UC Berkeley (C++)

• http://caffe.berkeleyvision.org/• pylearn2: Bengio (python)

• https://github.com/lisa-lab/pylearn2• CURRENNT: Weninger et al (Munchen) (C++)

• http://sourceforge.net/projects/currennt/• Libraries : Torch(http://torch.ch/), Theano(http://deeplearning.net/software/theano/)

Page 112: Computer vision lab seminar(deep learning) yong hoon

2015

CVL

Wee

kly

Sem

inar

112

THANK YOU