Deep Convolutional Nets
11th March 2015
Jiaxin Shi
Tsinghua University
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
A Brief Introduction to CNN
• The replicated feature approach
• Use many different copies of the
same feature detector with
different positions.
– Replication greatly reduces the
number of free parameters to be
learned.
• Use several different feature types,
each with its own map of replicated
detectors.
– Allows each patch of image to be
represented in several ways.
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
A Brief Introduction to CNN
• What does replicating the feature detectors achieve?
• Equivariant activities: Replicated features do not make the
neural activities invariant to translation. The activities are
equivariant.
• Invariant knowledge: If a feature is useful in some locations
during training, detectors for that feature will be available in all
locations during testing.
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
A Brief Introduction to CNN
• Pooling the outputs of replicated feature detectors
• Get a small amount of translational invariance at each level by
averaging four neighboring replicated detectors to give a single
output to the next level.
– This reduces the number of inputs to the next layer of feature
extraction, thus allowing us to have many more different feature
maps.
– Taking the maximum of the four works slightly better.
• Problem: After several levels of pooling, we have lost information
about the precise positions of things.
– This makes it impossible to use the precise spatial
relationships between high-level parts for recognition.
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
A Brief Introduction to CNN
• Terminology
Kernel: 5x5
Image
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
A Brief Introduction to CNN
• Terminology
Kernel: 5x5
Stride: 2
Image
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
A Brief Introduction to CNN
• Terminology
Image
Padding: 1
Kernel: 5x5
Stride: 2
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
A Brief Introduction to CNN
• Terminology
Image
Feature map 1
Feature map 0
Feature map 2
Convolution Layer (5x5, 2, 1, 4)(kernel size, stride, padding, number of kernels)
Feature map 3
Padding: 1
Kernel: 5x5
Stride: 2
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
A Brief Introduction to CNN
• Terminology
4 feature maps
Feature map 1
Feature map 0
Feature map 2
Pooling Layer (4x4, 4, 0)(pooling size, stride, padding)
Feature map 3
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
A Brief Introduction to CNN
• An example – a ‘VW’ detector
V\
/
^
Input ImageChannel: 1
Layer 1 output
Channel: 3
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
A Brief Introduction to CNN
• An example – a ‘VW’ detector
V\
/
^
Input ImageChannel: 1
Layer 1 output
Channel: 3
‘V’ detector
‘W’ detector
Layer1Filter
(detector): 3
Layer2Filter (detector):
2x3Output Channel:
2
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
A Brief Introduction to CNN
• An example – a ‘VW’ detector
W\
/
^
Input ImageChannel: 1
Layer 1 output
Channel: 3
‘V’ detector
‘W’ detector
Layer1Filter
(detector): 3
Layer2Filter (detector):
2x3Output Channel:
2
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
History
• 1979, Neocognitron (Fukushima), the first convolutional nets.
Fukushima, however, did not set the weights by supervised
backpropagation, but by local unsupervised learning rules.
• 1989, LeNet (LeCun), BP for Convolutional NNs.
LeCun re-invented CNN with BP.
• 1992, Cresceptron (Weng et al., 1992), Max Pooling.
Later integrated with CNN (MPCNN).
• 2006, CNN trained on GPU (Chellapilla et al., 2006).
• 2011, Multi-Column GPU-MPCNNs (Ciresan et al., 2011),
superhuman performance.
The first system to achieve superhuman visual pattern recognition
in the IJCNN 2011 traffic sign recognition contest.
• 2012, ImageNet Breakthrough (Krizhevsky et al., 2012).
AlexNet trained on GPUs won imageNet competition.
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Outline
• Recent Progress of Supervised Convolutional
Nets
• AlexNet
• GoogLeNet
• VGGNet
• Small Break: Microsoft’s Tricks
• Representation Learning and Bayesian
Approach
• Deconvolutional Networks
• Bayesian Deep Deconvolutional Networks
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Outline
• Recent Progress of Supervised
Convolutional Nets
• AlexNet
• GoogLeNet
• VGGNet
• Small Break: Microsoft’s Tricks
• Representation Learning and Bayesian
Approach
• Deconvolutional Networks
• Bayesian Deep Deconvolutional Networks
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Recent Progress of Supervised Convolutional Nets• AlexNet, 2012
• The architecture which made the 2012 ImageNet breakthrough.
• NIPS12, ImageNet Classification with Deep Convolutional Neural
Networks.
• A general practical guide of training deep supervised convnets.
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Recent Progress of Supervised Convolutional Nets• AlexNet, 2012
• The architecture which made the 2012 ImageNet breakthrough.
• NIPS12, ImageNet Classification with Deep Convolutional Neural
Networks.
• A general practical guide of training deep supervised convnets.
• Main techniques
• ReLU nonlinearity
• Data augmentation
• Dropout
• Overlapping pooling
• Mini-batch SGD with momentum and weight decay
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Recent Progress of Supervised Convolutional Nets• AlexNet, 2012
• Dropout
• Reduce overfit
• Model Average
A Brief Proof
𝑥 𝑦=𝑎𝑟𝑔𝑚𝑎𝑥𝑘′ (𝑜𝑢𝑡𝑘′)
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Recent Progress of Supervised Convolutional Nets• AlexNet, 2012
• Dropout
• Encourage sparsity
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Recent Progress of Supervised Convolutional Nets• GoogLeNet, 2014
• The 2014 ImageNet competition winner.
• CNN can go further if carefully tuned.
• Main techniques
• Carefully designed inception architecture
• Network in Network
• Deeply Supervised Nets
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Recent Progress of Supervised Convolutional Nets• GoogLeNet, 2014
• The 2014 ImageNet competition winner.
• CNN can go further if carefully tuned.
• Main techniques
• Carefully designed inception architecture
• Network in Network
• Deeply Supervised Nets
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Recent Progress of Supervised Convolutional Nets• GoogLeNet, 2014
• The 2014 ImageNet competition winner.
• CNN can go further if carefully tuned.
• Main techniques
• Network in Network
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Recent Progress of Supervised Convolutional Nets• GoogLeNet, 2014
• The 2014 ImageNet competition winner.
• CNN can go further if carefully tuned.
• Main techniques
• Network in Network
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Recent Progress of Supervised Convolutional Nets• GoogLeNet, 2014
• The 2014 ImageNet competition winner.
• CNN can go further if carefully tuned.
• Main techniques
• Deeply Supervised Net
• associating a “companion” classification output with each
hidden layer.
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Recent Progress of Supervised Convolutional Nets• GoogLeNet, 2014
• The 2014 ImageNet competition winner.
• CNN can go further if carefully tuned.
• Main techniques
• Deeply Supervised Net
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Recent Progress of Supervised Convolutional Nets• VGGNet, 2014
• A simple and always state-of-art architecture compared to
GoogLeNet-like structure (very hard to tune).
• Developed by Oxford (later DeepMind) people. Based on
Zeiler & Fergus’s 2013 work.
• Most widely used now.
• Small filter (3x3) and small stride (1)
Jiaxin Shi 11th March 2015 Tsinghua University
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Outline
• Recent Progress of Supervised Convolutional
Nets
• AlexNet
• GoogLeNet
• VGGNet
• Small Break: Microsoft’s Tricks
• Representation Learning and Bayesian
Approach
• Deconvolutional Networks
• Bayesian Deep Deconvolutional Networks
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Outline
• Recent Progress of Supervised Convolutional
Nets
• AlexNet
• GoogLeNet
• VGGNet
• Small Break: Microsoft’s Tricks
• Representation Learning and Bayesian
Approach
• Deconvolutional Networks
• Bayesian Deep Deconvolutional Networks
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Representation Learning and Bayesian Approach• Deconvolutional Networks, Zeiler & Fergus, CVPR 2010
• Deep layered model for representation learning
• Optimization perspective
• Results are better than previous representation learning
methods but there is still distance from supervised CNN models.
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Representation Learning and Bayesian Approach• Deconvolutional Networks, Zeiler & Fergus, CVPR 2010
• : number of filters (dictionaries).
• : channel c of the nth image.
• : channel c of the kth filter (dictionary).
• : sparse, indicates the position and pixel-wise strength of .
• Cost function of the first layer
• : number of channels. : number of filters (dictionaries).
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Representation Learning and Bayesian Approach• Deconvolutional Networks, Zeiler & Fergus, CVPR 2010
• Stack layer
• : layer l’s number of channels.
• Learning process
• Optimize layer by layer.
• Optimize over feature maps .
• Optimize over filters (dictionaries) .
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Representation Learning and Bayesian Approach• Deconvolutional Networks, Zeiler & Fergus, CVPR 2010
• Stack layer
• : layer l’s number of channels.
• Learning process
• Optimize layer by layer.
• Optimize over feature maps .
• When , convex.
• But poorly conditioned due to being coupled to one
another by filters. (Why?)
• Optimize over filters (dictionaries) . Using gradient
descent.
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Representation Learning and Bayesian Approach• Deconvolutional Networks, Zeiler & Fergus, CVPR 2010
• Learning process
• Optimize layer by layer.
• Optimize over feature maps .
• When , convex.
• But poorly conditioned due to being coupled to one
another by filters.
• Solution:
𝐶𝑙 (𝑊 𝑙− 1(𝑛 ) )= 𝜆
2∑𝑐=1
𝐾 𝑙 −1||∑𝑘=1
(𝐾 𝑙 )
𝑊 𝑙(𝑛 ,𝑘)∗𝐷 (𝑘 , 𝑐 )−𝑊 𝑙−1
(𝑛 , 𝑐 )||22
+∑𝑘=1
𝐾 𝑙
|𝑥 𝑙(𝑛 ,𝑘)|𝑝+∑
𝑘=1
𝐾 𝑙
‖𝑥 𝑙(𝑛 ,𝑘)−𝑊 𝑙
(𝑛 ,𝑘)‖22
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Representation Learning and Bayesian Approach• Deconvolutional Networks, Zeiler & Fergus, CVPR 2010
• Performance (slightly outperforms sift-based approaches and
CDBN)
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Representation Learning and Bayesian Approach• Bayesian Deep Deconvolutional Learning, Yunchen, 2015
• Deep layered model for representation learning
• Bayesian perspective
• Claim state-of-art classification performance using
representation learned
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Representation Learning and Bayesian Approach• Bayesian Deep Deconvolutional Learning, Yunchen, 2015
• : the nth image.
• : indicates which shifted version of is used to
represent .
• : indicates the pixel-wise strength of .
• Compared to the Deconvolutional Networks paper
• here is actually an explicit version of sparse in the
2010 paper.
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Representation Learning and Bayesian Approach• Bayesian Deep Deconvolutional Learning, Yunchen, 2015
• : the nth image.
• : indicates which shifted version of is used to
represent .
• : indicates the pixel-wise strength of .
• Priors
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Representation Learning and Bayesian Approach• Bayesian Deep Deconvolutional Learning, Yunchen, 2015
• : the nth image.
• : indicates which shifted version of is used to
represent .
• : indicates the pixel-wise strength of .
• Pooling
• Within each block of S(n,kl,l), either all nxny pixels are
zero, or only one pixel is non-zero, with the position of
that pixel selected stochastically via a multinomial
distribution.
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Representation Learning and Bayesian Approach• Pooling
• Within each block of S(n,kl,l), either all nx*ny pixels are
zero, or only one pixel is non-zero, with the position of that
pixel selected stochastically via a multinomial distribution.
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Representation Learning and Bayesian Approach• Pooling
• Within each block of S(n,kl,l), either all nx*ny pixels are
zero, or only one pixel is non-zero, with the position of that
pixel selected stochastically via a multinomial distribution.
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Representation Learning and Bayesian Approach• Learning Process
• Bottom to top: gibbs sampling and MAP samples selected
• Top to Bottom Refinement
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Representation Learning and Bayesian Approach• Bayesian Deep Deconvolutional Learning, Yunchen, 2015
• Intuition of Deconvolutional Networks (Generative)
• An image is made up of patches.
• These patches are weighted transformation of dictionary
elements.
• We learn dictionaries from training data.
• A new image is then represented by position and weights of
dictionaries.
• Intuition of Convolutional Networks
• An image is made up of patches.
• We can learn feature detectors for various kinds of patches.
• Then we use these feature detectors to scan a new image,
and classify it based on features (kinds of patches) detected.
• Both are translation equivariant.
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Representation Learning and Bayesian Approach• Performance
Deep Convolutional Nets
Jiaxin Shi 11th March 2015 Tsinghua University
Discussion
• Deep Supervised CNNs still has limits. Where lies further
improvement?
• Why does bayesian learning of deconvolution representations work
much better than those in optimization perspective?
Jiaxin Shi 11th March 2015 Tsinghua University
Thank you.
Deep Convolutional Nets