deep convolutional nets 11th march 2015 jiaxin shi tsinghua university

46
Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Upload: eugene-parker

Post on 17-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

11th March 2015

Jiaxin Shi

Tsinghua University

Page 2: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• The replicated feature approach

• Use many different copies of the

same feature detector with

different positions.

– Replication greatly reduces the

number of free parameters to be

learned.

• Use several different feature types,

each with its own map of replicated

detectors.

– Allows each patch of image to be

represented in several ways.

Page 3: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• What does replicating the feature detectors achieve?

• Equivariant activities: Replicated features do not make the

neural activities invariant to translation. The activities are

equivariant.

• Invariant knowledge: If a feature is useful in some locations

during training, detectors for that feature will be available in all

locations during testing.

Page 4: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• Pooling the outputs of replicated feature detectors

• Get a small amount of translational invariance at each level by

averaging four neighboring replicated detectors to give a single

output to the next level.

– This reduces the number of inputs to the next layer of feature

extraction, thus allowing us to have many more different feature

maps.

– Taking the maximum of the four works slightly better.

• Problem: After several levels of pooling, we have lost information

about the precise positions of things.

– This makes it impossible to use the precise spatial

relationships between high-level parts for recognition.

Page 5: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• Terminology

Kernel: 5x5

Image

Page 6: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• Terminology

Kernel: 5x5

Stride: 2

Image

Page 7: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• Terminology

Image

Padding: 1

Kernel: 5x5

Stride: 2

Page 8: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• Terminology

Image

Feature map 1

Feature map 0

Feature map 2

Convolution Layer (5x5, 2, 1, 4)(kernel size, stride, padding, number of kernels)

Feature map 3

Padding: 1

Kernel: 5x5

Stride: 2

Page 9: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• Terminology

4 feature maps

Feature map 1

Feature map 0

Feature map 2

Pooling Layer (4x4, 4, 0)(pooling size, stride, padding)

Feature map 3

Page 10: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• An example – a ‘VW’ detector

V\

/

^

Input ImageChannel: 1

Layer 1 output

Channel: 3

Page 11: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• An example – a ‘VW’ detector

V\

/

^

Input ImageChannel: 1

Layer 1 output

Channel: 3

‘V’ detector

‘W’ detector

Layer1Filter

(detector): 3

Layer2Filter (detector):

2x3Output Channel:

2

Page 12: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

A Brief Introduction to CNN

• An example – a ‘VW’ detector

W\

/

^

Input ImageChannel: 1

Layer 1 output

Channel: 3

‘V’ detector

‘W’ detector

Layer1Filter

(detector): 3

Layer2Filter (detector):

2x3Output Channel:

2

Page 13: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

History

• 1979, Neocognitron (Fukushima), the first convolutional nets.

Fukushima, however, did not set the weights by supervised

backpropagation, but by local unsupervised learning rules.

• 1989, LeNet (LeCun), BP for Convolutional NNs.

LeCun re-invented CNN with BP.

• 1992, Cresceptron (Weng et al., 1992), Max Pooling.

Later integrated with CNN (MPCNN).

• 2006, CNN trained on GPU (Chellapilla et al., 2006).

• 2011, Multi-Column GPU-MPCNNs (Ciresan et al., 2011),

superhuman performance.

The first system to achieve superhuman visual pattern recognition

in the IJCNN 2011 traffic sign recognition contest.

• 2012, ImageNet Breakthrough (Krizhevsky et al., 2012).

AlexNet trained on GPUs won imageNet competition.

Page 14: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Outline

• Recent Progress of Supervised Convolutional

Nets

• AlexNet

• GoogLeNet

• VGGNet

• Small Break: Microsoft’s Tricks

• Representation Learning and Bayesian

Approach

• Deconvolutional Networks

• Bayesian Deep Deconvolutional Networks

Page 15: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Outline

• Recent Progress of Supervised

Convolutional Nets

• AlexNet

• GoogLeNet

• VGGNet

• Small Break: Microsoft’s Tricks

• Representation Learning and Bayesian

Approach

• Deconvolutional Networks

• Bayesian Deep Deconvolutional Networks

Page 16: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• AlexNet, 2012

• The architecture which made the 2012 ImageNet breakthrough.

• NIPS12, ImageNet Classification with Deep Convolutional Neural

Networks.

• A general practical guide of training deep supervised convnets.

Page 17: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• AlexNet, 2012

• The architecture which made the 2012 ImageNet breakthrough.

• NIPS12, ImageNet Classification with Deep Convolutional Neural

Networks.

• A general practical guide of training deep supervised convnets.

• Main techniques

• ReLU nonlinearity

• Data augmentation

• Dropout

• Overlapping pooling

• Mini-batch SGD with momentum and weight decay

Page 18: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• AlexNet, 2012

• Dropout

• Reduce overfit

• Model Average

A Brief Proof

𝑥 𝑦=𝑎𝑟𝑔𝑚𝑎𝑥𝑘′ (𝑜𝑢𝑡𝑘′)

Page 19: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• AlexNet, 2012

• Dropout

• Encourage sparsity

Page 20: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• GoogLeNet, 2014

• The 2014 ImageNet competition winner.

• CNN can go further if carefully tuned.

• Main techniques

• Carefully designed inception architecture

• Network in Network

• Deeply Supervised Nets

Page 21: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• GoogLeNet, 2014

• The 2014 ImageNet competition winner.

• CNN can go further if carefully tuned.

• Main techniques

• Carefully designed inception architecture

• Network in Network

• Deeply Supervised Nets

Page 22: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• GoogLeNet, 2014

• The 2014 ImageNet competition winner.

• CNN can go further if carefully tuned.

• Main techniques

• Network in Network

Page 23: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• GoogLeNet, 2014

• The 2014 ImageNet competition winner.

• CNN can go further if carefully tuned.

• Main techniques

• Network in Network

Page 24: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• GoogLeNet, 2014

• The 2014 ImageNet competition winner.

• CNN can go further if carefully tuned.

• Main techniques

• Deeply Supervised Net

• associating a “companion” classification output with each

hidden layer.

Page 25: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• GoogLeNet, 2014

• The 2014 ImageNet competition winner.

• CNN can go further if carefully tuned.

• Main techniques

• Deeply Supervised Net

Page 26: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Recent Progress of Supervised Convolutional Nets• VGGNet, 2014

• A simple and always state-of-art architecture compared to

GoogLeNet-like structure (very hard to tune).

• Developed by Oxford (later DeepMind) people. Based on

Zeiler & Fergus’s 2013 work.

• Most widely used now.

• Small filter (3x3) and small stride (1)

Page 27: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Jiaxin Shi 11th March 2015 Tsinghua University

Page 28: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Outline

• Recent Progress of Supervised Convolutional

Nets

• AlexNet

• GoogLeNet

• VGGNet

• Small Break: Microsoft’s Tricks

• Representation Learning and Bayesian

Approach

• Deconvolutional Networks

• Bayesian Deep Deconvolutional Networks

Page 29: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Outline

• Recent Progress of Supervised Convolutional

Nets

• AlexNet

• GoogLeNet

• VGGNet

• Small Break: Microsoft’s Tricks

• Representation Learning and Bayesian

Approach

• Deconvolutional Networks

• Bayesian Deep Deconvolutional Networks

Page 30: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Deconvolutional Networks, Zeiler & Fergus, CVPR 2010

• Deep layered model for representation learning

• Optimization perspective

• Results are better than previous representation learning

methods but there is still distance from supervised CNN models.

Page 31: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Deconvolutional Networks, Zeiler & Fergus, CVPR 2010

• : number of filters (dictionaries).

• : channel c of the nth image.

• : channel c of the kth filter (dictionary).

• : sparse, indicates the position and pixel-wise strength of .

• Cost function of the first layer

• : number of channels. : number of filters (dictionaries).

Page 32: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Deconvolutional Networks, Zeiler & Fergus, CVPR 2010

• Stack layer

• : layer l’s number of channels.

• Learning process

• Optimize layer by layer.

• Optimize over feature maps .

• Optimize over filters (dictionaries) .

Page 33: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Deconvolutional Networks, Zeiler & Fergus, CVPR 2010

• Stack layer

• : layer l’s number of channels.

• Learning process

• Optimize layer by layer.

• Optimize over feature maps .

• When , convex.

• But poorly conditioned due to being coupled to one

another by filters. (Why?)

• Optimize over filters (dictionaries) . Using gradient

descent.

Page 34: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Deconvolutional Networks, Zeiler & Fergus, CVPR 2010

• Learning process

• Optimize layer by layer.

• Optimize over feature maps .

• When , convex.

• But poorly conditioned due to being coupled to one

another by filters.

• Solution:

𝐶𝑙 (𝑊 𝑙− 1(𝑛 ) )= 𝜆

2∑𝑐=1

𝐾 𝑙 −1||∑𝑘=1

(𝐾 𝑙 )

𝑊 𝑙(𝑛 ,𝑘)∗𝐷 (𝑘 , 𝑐 )−𝑊 𝑙−1

(𝑛 , 𝑐 )||22

+∑𝑘=1

𝐾 𝑙

|𝑥 𝑙(𝑛 ,𝑘)|𝑝+∑

𝑘=1

𝐾 𝑙

‖𝑥 𝑙(𝑛 ,𝑘)−𝑊 𝑙

(𝑛 ,𝑘)‖22

Page 35: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Deconvolutional Networks, Zeiler & Fergus, CVPR 2010

• Performance (slightly outperforms sift-based approaches and

CDBN)

Page 36: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Bayesian Deep Deconvolutional Learning, Yunchen, 2015

• Deep layered model for representation learning

• Bayesian perspective

• Claim state-of-art classification performance using

representation learned

Page 37: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Bayesian Deep Deconvolutional Learning, Yunchen, 2015

• : the nth image.

• : indicates which shifted version of is used to

represent .

• : indicates the pixel-wise strength of .

• Compared to the Deconvolutional Networks paper

• here is actually an explicit version of sparse in the

2010 paper.

Page 38: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Bayesian Deep Deconvolutional Learning, Yunchen, 2015

• : the nth image.

• : indicates which shifted version of is used to

represent .

• : indicates the pixel-wise strength of .

• Priors

Page 39: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Bayesian Deep Deconvolutional Learning, Yunchen, 2015

• : the nth image.

• : indicates which shifted version of is used to

represent .

• : indicates the pixel-wise strength of .

• Pooling

• Within each block of S(n,kl,l), either all nxny pixels are

zero, or only one pixel is non-zero, with the position of

that pixel selected stochastically via a multinomial

distribution.

Page 40: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Pooling

• Within each block of S(n,kl,l), either all nx*ny pixels are

zero, or only one pixel is non-zero, with the position of that

pixel selected stochastically via a multinomial distribution.

Page 41: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Pooling

• Within each block of S(n,kl,l), either all nx*ny pixels are

zero, or only one pixel is non-zero, with the position of that

pixel selected stochastically via a multinomial distribution.

Page 42: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Learning Process

• Bottom to top: gibbs sampling and MAP samples selected

• Top to Bottom Refinement

Page 43: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Bayesian Deep Deconvolutional Learning, Yunchen, 2015

• Intuition of Deconvolutional Networks (Generative)

• An image is made up of patches.

• These patches are weighted transformation of dictionary

elements.

• We learn dictionaries from training data.

• A new image is then represented by position and weights of

dictionaries.

• Intuition of Convolutional Networks

• An image is made up of patches.

• We can learn feature detectors for various kinds of patches.

• Then we use these feature detectors to scan a new image,

and classify it based on features (kinds of patches) detected.

• Both are translation equivariant.

Page 44: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Representation Learning and Bayesian Approach• Performance

Page 45: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Deep Convolutional Nets

Jiaxin Shi 11th March 2015 Tsinghua University

Discussion

• Deep Supervised CNNs still has limits. Where lies further

improvement?

• Why does bayesian learning of deconvolution representations work

much better than those in optimization perspective?

Page 46: Deep Convolutional Nets 11th March 2015 Jiaxin Shi Tsinghua University

Jiaxin Shi 11th March 2015 Tsinghua University

Thank you.

Deep Convolutional Nets