[pr12] capsule networks - jaejun yoo

Post on 22-Jan-2018

610 Views

Category:

Science

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Capsule Networks

PR12와 함께 이해하는

Jaejun YooPh.D. Candidate @KAIST

PR12

17th Dec, 2017

Today’s contents

Dynamic Routing Between Capsules

by Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton

Oct. 2017: https://arxiv.org/abs/1710.09829

NIPS 2017 Paper

Convolutional Neural Networks

What is the problem with CNNs?

Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc

1) If the images have rotation, tilt or any other different orientation then CNNs have poor performance.2) In CNN each layer understands an image at a much more granular level (slow increase in receptive field).

DATA AUGMENTATION,MAX POOLING

Convolutional Neural Networks

What is the problem with CNNs?

Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc

“Pooling helps in creating the positional invariance. Otherwise This invariance also triggers false positive for images which have the components of a ship but not in the correct order.”

Convolutional Neural Networks

What is the problem with CNNs?

Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc

“Pooling helps in creating the positional invariance. Otherwise This invariance also triggers false positive for images which have the components of a ship but not in the correct order.”

This was never the intention of pooling layer!

Convolutional Neural Networks

What we need : EQUIVARIANCE (not invariance)

Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc

“Equivariance makes a CNN understand the rotation or proportion change and adapt itself accordingly so that the spatial positioning inside an image is not lost.”

Capsules

“A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or an object part.”

8D capsule e.g.

Hue, Position, Size, Orientation, deformation, texture, etc.

Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets

8D capsule e.g.

Hue, Position, Size, Orientation, deformation, texture, etc.

Capsules

8D vector

Inverse Rendering

Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets

8D capsule e.g.

Hue, Position, Size, Orientation, deformation, texture, etc.

Capsules

8D vector

Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets

8D capsule e.g.

Hue, Position, Size, Orientation, deformation, texture, etc.

Capsules

8D vector

Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets

8D capsule e.g.

Hue, Position, Size, Orientation, deformation, texture, etc.

Capsules

8D vector

Equivariance of Capsules

Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets

8D capsule e.g.

Hue, Position, Size, Orientation, deformation, texture, etc.

Capsules

8D vector

Equivariance of Capsules

Contents from https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-iii-dynamic-routing-between-capsules-349f6d30418

Routing by Agreements

Aurélien Géron, 2017

Primary Capsules

=

=

Primary Capsules

Aurélien Géron, 2017

Predict Next Layer’s Output

=

=

Primary Capsules

Aurélien Géron, 2017

Predict Next Layer’s Output

=

=

Primary Capsules

Aurélien Géron, 2017

Predict Next Layer’s Output

=

=

One transformation matrix Wi,jper part/whole pair (i, j).

ûj|i = Wi,j ui

Primary Capsules

Aurélien Géron, 2017

Predict Next Layer’s Output

=

=

Primary Capsules

Aurélien Géron, 2017

Predict Next Layer’s Output

=

=

Primary Capsules

Aurélien Géron, 2017

Compute Next Layer’s Output

=

=

Predicted Outputs

Primary Capsules

Aurélien Géron, 2017

Routing by Agreement

=

=

Predicted Outputs

Primary Capsules

Strong agreement!

Aurélien Géron, 2017

The rectangle and triangle capsules should be routed to the boat capsules.

Routing by Agreement

=

=

Predicted Outputs

Primary Capsules

Strong agreement!

Aurélien Géron, 2017

Routing Weights

=

=

Predicted Outputs

Primary Capsules

bi,j=0 for all i, j

Aurélien Géron, 2017

Routing Weights

=

=

Predicted Outputs

Primary Capsules

0.5

0.5

0.5

0.5

bi,j=0 for all i, j

ci = softmax(bi)

Aurélien Géron, 2017

Compute Next Layer’s Output

=

=

Predicted Outputs

sj = weighted sum

Primary Capsules

0.5

0.5

0.5

0.5

Aurélien Géron, 2017

Compute Next Layer’s Output

=

=

Predicted Outputs

Primary Capsules

0.5

0.5

0.5

0.5

sj = weighted sum

vj = squash(sj)

Aurélien Géron, 2017

Actual outputsof the next layer capsules(round #1)

Compute Next Layer’s Output

=

=

Predicted Outputs

Primary Capsules

0.5

0.5

0.5

0.5

sj = weighted sum

vj = squash(sj)

Aurélien Géron, 2017

Actual outputsof the next layer capsules(round #1)

Update Routing Weights

=

=

Predicted Outputs

Primary Capsules

Agreement

Aurélien Géron, 2017

Actual outputsof the next layer capsules(round #1)

Update Routing Weights

=

=

Predicted Outputs

Primary Capsules

Agreement bi,j += ûj|i . vj

Aurélien Géron, 2017

Actual outputsof the next layer capsules(round #1)

Update Routing Weights

=

=

Predicted Outputs

Primary Capsules

Agreement bi,j += ûj|i . vj

Large

Aurélien Géron, 2017

Actual outputsof the next layer capsules(round #1)

Update Routing Weights

=

=

Predicted Outputs

Primary Capsules

Disagreement bi,j += ûj|i . vj

Small

Aurélien Géron, 2017

Compute Next Layer’s Output

=

=

Predicted Outputs

Primary Capsules

0.2

0.1

0.8

0.9

Aurélien Géron, 2017

Compute Next Layer’s Output

=

=

Predicted Outputs

sj = weighted sum

Primary Capsules

0.2

0.1

0.8

0.9

Aurélien Géron, 2017

Compute Next Layer’s Output

=

=

Predicted Outputs

Primary Capsules

sj = weighted sum

vj = squash(sj)0.2

0.1

0.8

0.9

Aurélien Géron, 2017

Actual outputsof the next layer capsules(round #2)

Compute Next Layer’s Output

=

=

Predicted Outputs

Primary Capsules

0.2

0.1

0.8

0.9

Aurélien Géron, 2017

Handling Crowded Scenes

=

=

=

=

Aurélien Géron, 2017

Handling Crowded Scenes

=

=

=

=

Is this an upside down house?

Aurélien Géron, 2017

Handling Crowded Scenes

=

=

=

=

House

Thanks to routing by agreement, the ambiguity is quickly resolved (explaining away).

Boat

Aurélien Géron, 2017

Classification CapsNet

|| ℓ2 || Estimated Class Probability

Aurélien Géron, 2017

Training

|| ℓ2 || Estimated Class ProbabilityTo allow multiple classes, minimize margin loss:

Lk = Tk max(0, m+ - ||vk||2)

+ λ (1 - Tk) max(0, ||vk||2 - m-)

Tk = 1 iff class k is present

In the paper:m- = 0.1m+ = 0.9λ = 0.5

Aurélien Géron, 2017

Training

Translated to English:

“If an object of class k

is present, then ||vk||2should be no less than 0.9. If not, then ||vk||2should be no more than 0.1.”

|| ℓ2 || Estimated Class ProbabilityTo allow multiple classes, minimize margin loss:

Lk = Tk max(0, m+ - ||vk||2)

+ λ (1 - Tk) max(0, ||vk||2 - m-)

Tk = 1 iff class k is present

In the paper:m- = 0.1m+ = 0.9λ = 0.5

Aurélien Géron, 2017

Regularization by Reconstruction

|| ℓ2 ||

Feedforward Neural Network

Decoder

Reconstruction

Aurélien Géron, 2017

Regularization by Reconstruction

|| ℓ2 ||

Feedforward Neural Network

Decoder

Reconstruction

Loss = margin loss + α reconstruction loss

The reconstruction loss is the squared difference between the reconstructed image and the input image.In the paper, α = 0.0005.

Aurélien Géron, 2017

A CapsNet for MNIST

(Figure 1 from the paper)

Aurélien Géron, 2017

A CapsNet for MNIST – Decoder

(Figure 2 from the paper)

Aurélien Géron, 2017

Interpretable Activation Vectors

(Figure 4 from the paper)

Aurélien Géron, 2017

Pros

● Reaches high accuracy on MNIST, and promising on CIFAR10

● Requires less training data

● Position and pose information are preserved (equivariance)

● This is promising for image segmentation and object detection

● Routing by agreement is great for overlapping objects (explaining away)

● Capsule activations nicely map the hierarchy of parts

● Offers robustness to affine transformations● Activation vectors are easier to interpret (rotation, thickness, skew…)

● It’s Hinton! ;-)

Aurélien Géron, 2017

● Not state of the art on CIFAR10 (but it’s a good start)

● Not tested yet on larger images (e.g., ImageNet): will it work well?

● Slow training, due to the inner loop (in the routing by agreement algorithm)

● A CapsNet cannot see two very close identical objects○ This is called “crowding”, and it has been observed as well in human vision

Cons

Results

What the individual dimensions of a capsule represent

Results

MultiMNISTSegmenting Highly Overlapping Digits

Questions Remained

Does capsules really work as the real neurons do?

perceptual illusions

Thompson, P. (1980). Margaret Thatcher: a new illusion. Perception, 38, (6). 483-484.

• https://arxiv.org/abs/1710.09829 (paper)• https://jhui.github.io/2017/11/03/Dynamic-Routing-Between-Capsules/

• https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc

• https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b

• https://www.youtube.com/watch?v=pPN8d0E3900 (video)• https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets (video slides)

References

top related