pushpak bhattacharyya computer science and engineering department iit bombay

CS621: Artificial IntelligenceLecture 22-23: Sigmoid neuron,

Backpropagation(Lecture 20 and 21 taken by Anup on Graphical

Models)Pushpak Bhattacharyya

Computer Science and Engineering DepartmentIIT Bombay

Training of the MLP

• Multilayer Perceptron (MLP)

• Question:- How to find weights for the hidden layers when no target output is available?

• Credit assignment problem – to be solved by “Gradient Descent”

Gradient Descent Technique• Let E be the error at the output layer

• ti = target output; oi = observed output

• i is the index going over n neurons in the outermost layer• j is the index going over the p patterns (1 to p)• Ex: XOR:– p=4 and n=1

p

j

n

ijii otE

1 1

2)(2

1

Weights in a ff NN• wmn is the weight of the connection

from the nth neuron to the mth neuron

• E vs surface is a complex surface in the space defined by the weights wij

• gives the direction in which a movement of the operating point in the wmn co-ordinate space will result in maximum decrease in error

W

m

n

wmn

mnw

E

mnmn w

Ew

Sigmoid neurons• Gradient Descent needs a derivative computation

- not possible in perceptron due to the discontinuous step function used!

Sigmoid neurons with easy-to-compute derivatives used!

• Computing power comes from non-linearity of sigmoid function.

xy

xy

as 0

as 1

Derivative of Sigmoid function

)1(1

11

1

1

)1()(

)1(

1

1

1

22

yyee

e

ee

edx

dy

ey

xx

x

xx

x

x

Training algorithm

• Initialize weights to random values.• For input x = <xn,xn-1,…,x0>, modify weights as follows

Target output = t, Observed output = o

• Iterate until E < (threshold)

ii w

Ew

2)(2

1otE

Calculation of ∆wi

ii

ii

i

i

n

iii

ii

xoootw

w

Ew

xooot

W

net

net

o

o

E

xwnetwhereW

net

net

E

W

E

)1()(

)10 constant, learning(

)1()(

:1

0

Observations

Does the training technique support our intuition?

• The larger the xi, larger is ∆wi – Error burden is borne by the weight values

corresponding to large input values

Backpropagation on feedforward network

Backpropagation algorithm

• Fully connected feed forward network• Pure FF network (no jumping of connections

over layers)

Hidden layers

Input layer (n i/p neurons)

Output layer (m o/p neurons)

j

i

wji

….

….

….

….

Gradient Descent Equations

iji

jji

j

thj

ji

j

jji

jiji

jow

netjw

jnet

E

netw

net

net

E

w

E

w

Ew

)layer j at theinput (

)10 rate, learning(

Backpropagation – for outermost layer

ijjjjji

jjjj

m

ppp

thj

j

j

jj

ooootw

oootj

otE

netnet

o

o

E

net

Ej

)1()(

))1()(( Hence,

)(2

1

)layer j at theinput (

1

2

Backpropagation for hidden layers

Hidden layers

Input layer (n i/p neurons)

Output layer (m o/p neurons)

j

i

….

….

….

….

k

k is propagated backwards to find value of j

Backpropagation – for hidden layers

ijjk

kkj

jjk

kjkj

jjk jk

jjj

j

j

jj

iji

ooow

oow

ooo

netk

net

E

ooo

E

net

o

o

E

net

Ej

jow

)1()(

)1()( Hence,

)1()(

)1(

layernext

layernext

layernext

General Backpropagation Rule

ijjk

kkj ooow )1()(layernext

)1()( jjjjj ooot

iji jow • General weight updating rule:

• Where for outermost layer

for hidden layers

How does it work?

• Input propagation forward and error propagation backward (e.g. XOR)

w2=1w1=1θ = 0.5

x1x2 x1x2

-1

x1 x2

-11.5

1.5

1 1

pushpak bhattacharyya computer science and engineering department iit bombay

Documents