[pr12] capsule networks - jaejun yoo

Capsule Networks

PR12와 함께 이해하는

Jaejun YooPh.D. Candidate @KAIST

17th Dec, 2017

Today’s contents

Dynamic Routing Between Capsules

by Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton

Oct. 2017: https://arxiv.org/abs/1710.09829

NIPS 2017 Paper

Convolutional Neural Networks

What is the problem with CNNs?

Contents from https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc

1) If the images have rotation, tilt or any other different orientation then CNNs have poor performance.2) In CNN each layer understands an image at a much more granular level (slow increase in receptive field).

DATA AUGMENTATION,MAX POOLING

“Pooling helps in creating the positional invariance. Otherwise This invariance also triggers false positive for images which have the components of a ship but not in the correct order.”

This was never the intention of pooling layer!

What we need : EQUIVARIANCE (not invariance)

“Equivariance makes a CNN understand the rotation or proportion change and adapt itself accordingly so that the spatial positioning inside an image is not lost.”

Capsules

“A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or an object part.”

8D capsule e.g.

Hue, Position, Size, Orientation, deformation, texture, etc.

Contents from https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets

8D capsule e.g.

Capsules

8D vector

Inverse Rendering

8D capsule e.g.

Capsules

8D vector

8D capsule e.g.

Capsules

8D vector

8D capsule e.g.

Capsules

8D vector

Equivariance of Capsules

8D capsule e.g.

Capsules

8D vector

Equivariance of Capsules

Contents from https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-iii-dynamic-routing-between-capsules-349f6d30418

Routing by Agreements

Aurélien Géron, 2017

Primary Capsules

Predict Next Layer’s Output

Primary Capsules

One transformation matrix Wi,jper part/whole pair (i, j).

ûj|i = Wi,j ui

Primary Capsules

Compute Next Layer’s Output

Predicted Outputs

Primary Capsules

Routing by Agreement

Predicted Outputs

Primary Capsules

Strong agreement!

The rectangle and triangle capsules should be routed to the boat capsules.

Routing by Agreement

Predicted Outputs

Primary Capsules

Strong agreement!

Routing Weights

Predicted Outputs

Primary Capsules

bi,j=0 for all i, j

Routing Weights

Predicted Outputs

Primary Capsules

bi,j=0 for all i, j

ci = softmax(bi)

Predicted Outputs

sj = weighted sum

Primary Capsules

Predicted Outputs

Primary Capsules

sj = weighted sum

vj = squash(sj)

Actual outputsof the next layer capsules(round #1)

Predicted Outputs

Primary Capsules

sj = weighted sum

vj = squash(sj)

Update Routing Weights

Predicted Outputs

Primary Capsules

Agreement

Predicted Outputs

Primary Capsules

Agreement bi,j += ûj|i . vj

Predicted Outputs

Primary Capsules

Agreement bi,j += ûj|i . vj

Predicted Outputs

Primary Capsules

Disagreement bi,j += ûj|i . vj

Predicted Outputs

Primary Capsules

Predicted Outputs

sj = weighted sum

Primary Capsules

Predicted Outputs

Primary Capsules

sj = weighted sum

vj = squash(sj)0.2

Predicted Outputs

Primary Capsules

Handling Crowded Scenes

Is this an upside down house?

Thanks to routing by agreement, the ambiguity is quickly resolved (explaining away).

Classification CapsNet

|| ℓ2 || Estimated Class Probability

Training

|| ℓ2 || Estimated Class ProbabilityTo allow multiple classes, minimize margin loss:

Lk = Tk max(0, m+ - ||vk||2)

+ λ (1 - Tk) max(0, ||vk||2 - m-)

Tk = 1 iff class k is present

In the paper:m- = 0.1m+ = 0.9λ = 0.5

Training

Translated to English:

“If an object of class k

is present, then ||vk||2should be no less than 0.9. If not, then ||vk||2should be no more than 0.1.”

|| ℓ2 || Estimated Class ProbabilityTo allow multiple classes, minimize margin loss:

Lk = Tk max(0, m+ - ||vk||2)

+ λ (1 - Tk) max(0, ||vk||2 - m-)

Tk = 1 iff class k is present

In the paper:m- = 0.1m+ = 0.9λ = 0.5

Regularization by Reconstruction

|| ℓ2 ||

Feedforward Neural Network

Decoder

Reconstruction

Regularization by Reconstruction

|| ℓ2 ||

Feedforward Neural Network

Decoder

Reconstruction

Loss = margin loss + α reconstruction loss

The reconstruction loss is the squared difference between the reconstructed image and the input image.In the paper, α = 0.0005.

A CapsNet for MNIST

(Figure 1 from the paper)

A CapsNet for MNIST – Decoder

Interpretable Activation Vectors

● Reaches high accuracy on MNIST, and promising on CIFAR10

● Requires less training data

● Position and pose information are preserved (equivariance)

● This is promising for image segmentation and object detection

● Routing by agreement is great for overlapping objects (explaining away)

● Capsule activations nicely map the hierarchy of parts

● Offers robustness to affine transformations● Activation vectors are easier to interpret (rotation, thickness, skew…)

● It’s Hinton! ;-)

● Not state of the art on CIFAR10 (but it’s a good start)

● Not tested yet on larger images (e.g., ImageNet): will it work well?

● Slow training, due to the inner loop (in the routing by agreement algorithm)

● A CapsNet cannot see two very close identical objects○ This is called “crowding”, and it has been observed as well in human vision

Results

What the individual dimensions of a capsule represent

Results

MultiMNISTSegmenting Highly Overlapping Digits

Questions Remained

Does capsules really work as the real neurons do?

perceptual illusions

Thompson, P. (1980). Margaret Thatcher: a new illusion. Perception, 38, (6). 483-484.

• https://arxiv.org/abs/1710.09829 (paper)• https://jhui.github.io/2017/11/03/Dynamic-Routing-Between-Capsules/

• https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc

• https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b

• https://www.youtube.com/watch?v=pPN8d0E3900 (video)• https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets (video slides)

References

[pr12] capsule networks - jaejun yoo

Science

fraktur radius ulna yoo

[pr12] inception and xception - jaejun yoo

jejoong yoo - bluewaters.ncsa.illinois.edu

heavy roughning milling cutter msr series …...monster...

yoo hoo! shanghai

yoo by phillipe starck

[pr12] understanding deep learning requires rethinking...

joseph yoo

presentación yoo

an update to pr12-09-014: target single spin asymmetry in...

identifikasi batuan yoo

yoo mag #3

egbeonkaweedeyoruba.files.wordpress.com · tuntun ìyá...

[pr12] pixelrnn- jaejun yoo

[pr12] intro. to gans jaejun yoo

no eres tu, soy yoo

skripsi-s1-pgsd yoo

plantilla yoo

yoo mag #1

yoo sinjae