pushpak bhattacharyya computer science and engineering department iit bombay
DESCRIPTION
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models). Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay. Training of the MLP. Multilayer Perceptron (MLP) - PowerPoint PPT PresentationTRANSCRIPT
CS621: Artificial IntelligenceLecture 22-23: Sigmoid neuron,
Backpropagation(Lecture 20 and 21 taken by Anup on Graphical
Models)Pushpak Bhattacharyya
Computer Science and Engineering DepartmentIIT Bombay
Training of the MLP
• Multilayer Perceptron (MLP)
• Question:- How to find weights for the hidden layers when no target output is available?
• Credit assignment problem – to be solved by “Gradient Descent”
Gradient Descent Technique• Let E be the error at the output layer
• ti = target output; oi = observed output
• i is the index going over n neurons in the outermost layer• j is the index going over the p patterns (1 to p)• Ex: XOR:– p=4 and n=1
p
j
n
ijii otE
1 1
2)(2
1
Weights in a ff NN• wmn is the weight of the connection
from the nth neuron to the mth neuron
• E vs surface is a complex surface in the space defined by the weights wij
• gives the direction in which a movement of the operating point in the wmn co-ordinate space will result in maximum decrease in error
W
m
n
wmn
mnw
E
mnmn w
Ew
Sigmoid neurons• Gradient Descent needs a derivative computation
- not possible in perceptron due to the discontinuous step function used!
Sigmoid neurons with easy-to-compute derivatives used!
• Computing power comes from non-linearity of sigmoid function.
xy
xy
as 0
as 1
Derivative of Sigmoid function
)1(1
11
1
1
)1()(
)1(
1
1
1
22
yyee
e
ee
edx
dy
ey
xx
x
xx
x
x
Training algorithm
• Initialize weights to random values.• For input x = <xn,xn-1,…,x0>, modify weights as follows
Target output = t, Observed output = o
• Iterate until E < (threshold)
ii w
Ew
2)(2
1otE
Calculation of ∆wi
ii
ii
i
i
n
iii
ii
xoootw
w
Ew
xooot
W
net
net
o
o
E
xwnetwhereW
net
net
E
W
E
)1()(
)10 constant, learning(
)1()(
:1
0
Observations
Does the training technique support our intuition?
• The larger the xi, larger is ∆wi – Error burden is borne by the weight values
corresponding to large input values
Backpropagation on feedforward network
Backpropagation algorithm
• Fully connected feed forward network• Pure FF network (no jumping of connections
over layers)
Hidden layers
Input layer (n i/p neurons)
Output layer (m o/p neurons)
j
i
wji
….
….
….
….
Gradient Descent Equations
iji
jji
j
thj
ji
j
jji
jiji
jow
netjw
jnet
E
netw
net
net
E
w
E
w
Ew
)layer j at theinput (
)10 rate, learning(
Backpropagation – for outermost layer
ijjjjji
jjjj
m
ppp
thj
j
j
jj
ooootw
oootj
otE
netnet
o
o
E
net
Ej
)1()(
))1()(( Hence,
)(2
1
)layer j at theinput (
1
2
Backpropagation for hidden layers
Hidden layers
Input layer (n i/p neurons)
Output layer (m o/p neurons)
j
i
….
….
….
….
k
k is propagated backwards to find value of j
Backpropagation – for hidden layers
ijjk
kkj
jjk
kjkj
jjk jk
jjj
j
j
jj
iji
ooow
oow
ooo
netk
net
E
ooo
E
net
o
o
E
net
Ej
jow
)1()(
)1()( Hence,
)1()(
)1(
layernext
layernext
layernext
General Backpropagation Rule
ijjk
kkj ooow )1()(layernext
)1()( jjjjj ooot
iji jow • General weight updating rule:
• Where for outermost layer
for hidden layers
How does it work?
• Input propagation forward and error propagation backward (e.g. XOR)
w2=1w1=1θ = 0.5
x1x2 x1x2
-1
x1 x2
-11.5
1.5
1 1