deep learning : hopfield networks

31
Lecture 11 2015. 08. 12 | Do Hoerin 1

Upload: do-hoerin

Post on 18-Aug-2015

249 views

Category:

Engineering


8 download

TRANSCRIPT

Page 1: Deep Learning : Hopfield Networks

Lecture 112015. 08. 12 | Do Hoerin

1

Page 2: Deep Learning : Hopfield Networks

Hopfield NetworksLecture 11a

2

Page 3: Deep Learning : Hopfield Networks

Introduce Hopfield Networks

• Energy Based Model• Composed of binary threshold with recurrent connections

• Hard to analyze : Settle to a stable state / oscillate / chaotic

• Connections are symmetricÆ global energy function

𝐸 = − 𝑖

𝑠𝑖𝑏𝑖 −12 𝑖,𝑗

𝑠𝑖𝑠𝑗 𝑤𝑖𝑗, ∆𝐸𝑖 = 𝑏𝑖 + 𝑗

𝑠𝑗𝑤𝑖𝑗

3

Page 4: Deep Learning : Hopfield Networks

Settling to an Energy Minimum

• Start from a random state• Update a unit in random order

−𝐸 = goodness = 3

4

1 0

1 0

0

-4

3 2 3 3

-1-1

Page 5: Deep Learning : Hopfield Networks

Settling to an Energy Minimum

−𝐸 = goodness = 3

5

1 0

1 0

0

-4

3 2 3 3

-1-1

Page 6: Deep Learning : Hopfield Networks

Settling to an Energy Minimum

−𝐸 = goodness = 4Settled to the minimum

6

1 1

1 0

0

-4

3 2 3 3

-1-1

Page 7: Deep Learning : Hopfield Networks

Settling to an Energy Minimum

7

Page 8: Deep Learning : Hopfield Networks

Settling to an Energy Minimum

• Two triangles in which three units mostly support each other

• Why decisions need to be seq.?• If simultaneous : energy can go up• Parallel updating can get oscillations

8

0 1

0 1

1

-4

3 2 3 3

-1-1

−𝐸 = goodness = 5

Page 9: Deep Learning : Hopfield Networks

A neat way to make use of this type of computation

• Hopfield proposed that memories could be energy minima• We can fill out the memory with incomplete part using the net

• Using energy minima represent memories give a content-addressable memory

9

Page 10: Deep Learning : Hopfield Networks

Storing Memories in a Hopfield Net

• We can store a binary state vector by incrementing the weight between any two units

• biases (weight from a permanently)

∆𝑤𝑖𝑗 = 𝑠𝑖𝑠𝑗

10

Page 11: Deep Learning : Hopfield Networks

Dealing with spurious minima in Hopfiled NetsLecture 11b

11

Page 12: Deep Learning : Hopfield Networks

Spurious Minima Limit Capacity

• Capacity : 0.15N memories with N units• After storing M memories, each connection weight has an

integer value in the range [-M, M]• Number of bits required to store the weights and biases is 𝑁2log(2𝑀 + 1)

12

Page 13: Deep Learning : Hopfield Networks

The Storage Capacity of a Hopfield Net

• Each time we memorize, we hope to create a new energy minimum

• But what if two nearby minima merge to create a minimum at an intermediate location?

13

Page 14: Deep Learning : Hopfield Networks

Increasing the Capacity

• Unlearning : get rid of spurious minima and increase memory capacity

• Unlearing vs. REM sleep?

• Pseudo-likelihood : instead of trying to store vectors in one shot, cycle through the training set many times

• Use the perceptron convergence procedure to train each unit to have the correct state given the states of all the other units in that vector

14

Page 15: Deep Learning : Hopfield Networks

Hopfield Nets with Hidden UnitsLecture 11c

15

Page 16: Deep Learning : Hopfield Networks

A Different Computational Role

• Use network to construct interpretations of sensory input

16

Hidden Units

Visible Units

Page 17: Deep Learning : Hopfield Networks

EX: What can we infer about 3D edges from 2D lines

• The information that has been lost in the image is the 3D depth of each end of the 2D line

17

Page 18: Deep Learning : Hopfield Networks

An example : Interpreting a Line Drawing

• Use a 2D line unit for each possible line

• Use a 3D line unit for each possible 3D line• Make 3D lines support each other

if they join in 3D• Make them strongly support each other

if they join at right angles

18

Page 19: Deep Learning : Hopfield Networks

Two Difficult Computational Issues

• Searching : How to avoid getting trapped in local minima?• Learning : How do we learn the weights on the connections

between units?

19

Page 20: Deep Learning : Hopfield Networks

Using Stochastic Unitsto Improve SearchLecture 11d

20

Page 21: Deep Learning : Hopfield Networks

Noisy Networks Find Better Energy Minima

• Use random noise to escape from poor minima

𝑝 𝑠𝑖 = 1 =1

1+𝑒−∆𝐸𝑖/𝑇(lead from ∆𝐸𝑖 = 𝑏𝑖 + 𝑗 𝑠𝑗𝑤𝑖𝑗)

21

AB 21

∆𝐸𝐴 ∆𝐸𝐵

Page 22: Deep Learning : Hopfield Networks

How Temperature Affects Transition Prob.

• 𝑝(𝑠𝐴)𝑝(𝑠𝐵)= 1+𝑒

−∆𝐸𝐵/𝑇

1+𝑒−∆𝐸𝐴/𝑇Æ if T decrease, ratio would increase

• So low temperature system is much better• But it will take much more times!

22

AB 22

Page 23: Deep Learning : Hopfield Networks

Approaching Thermal Equilibrium

• Thermal equilibrium doesn’t mean that the system has settled down into the lowest energy configuration!

• Just prob. distribution would settle down

• Start with any distribution we like over all the identical system• Keep applying stochastic update rule to pick the next config.• We may reach a situation where the fraction remains const.

23

Page 24: Deep Learning : Hopfield Networks

How aBoltzmann Machine models dataLecture 11e

24

Page 25: Deep Learning : Hopfield Networks

Modeling Binary Data

• Assign a probability to every possible binary vector• Useful for deciding if other binary vectors come from the same distribution• Can be used for monitoring complex system to detect unusual behavior

𝑝 𝑚𝑜𝑑𝑒𝑙 𝑖 𝑑𝑎𝑡𝑎) =𝑝 𝑑𝑎𝑡𝑎 𝑚𝑜𝑑𝑒𝑙 𝑖) 𝑗 𝑝 𝑑𝑎𝑡𝑎 𝑚𝑜𝑑𝑒𝑙 𝑗)

25

Page 26: Deep Learning : Hopfield Networks

Casual Model

• First step : pick the hidden states from their prior distribution• Second step : pick the visible state from their conditional dist.

26

𝑝 𝑣 = ℎ

𝑝 ℎ 𝑝(𝑣|ℎ)0 1

1 0 1

Page 27: Deep Learning : Hopfield Networks

How a Boltzmann Machine Generates Data

• Everything is defined in terms of the energies• The energies of joint configurations are related to their

probabilities in two ways• 𝑝(𝑣, ℎ) ∝ 𝑒−𝐸(𝑣,ℎ)

• The probability of finding the network in that joint configuration after we have updated all of the stochastic binary units

27

Page 28: Deep Learning : Hopfield Networks

Using Energy to Define Probabilities

−𝐸 𝑣, ℎ = 𝑖∈𝑣𝑖𝑠

𝑣𝑖𝑏𝑖 + 𝑘∈ℎ𝑖𝑑

ℎ𝑘𝑏𝑘 + 𝑖<𝑗

𝑣𝑖𝑣𝑗𝑤𝑖𝑗 + 𝑖,𝑘

𝑣𝑖ℎ𝑘𝑤𝑖𝑘 + 𝑘<𝑙

ℎ𝑘ℎ𝑙𝑤𝑘𝑙

28

• Both visible and hidden unit

𝑝 𝑣, ℎ =𝑒−𝐸(𝑣,ℎ)

𝑢,𝑔 𝑒−𝐸(𝑢,𝑔)

• Visible units

𝑝 𝑣 = ℎ 𝑒−𝐸(𝑣,ℎ)

𝑢,𝑔 𝑒−𝐸(𝑢,𝑔)

Page 29: Deep Learning : Hopfield Networks

EX: How Weights Define a Distribution

29

h1 h2

v1 v2

-1

+2 +1

Page 30: Deep Learning : Hopfield Networks

Getting a Sample From the Model

• Exponentially many terms with few more hidden units• Use MCMC starting from a random global configuration

• Until it reaches thermal equilibrium• The probability of a global configuration is then related to its energy by the

Boltzmann distribution

30

Page 31: Deep Learning : Hopfield Networks

Getting a Sample From the Posterior Distributionover Hidden Configurations for a Given Data Vector

• The number of possible hidden configurations is exponential• We need MCMC• All same except we keep the visible units clamped to the given data• Only the hidden units are allowed to change states

• Required for learning the weights• Each hidden config is an explanation of an observed visible config

31