pushpak bhattacharyya computer science and engineering department iit bombay

17
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Upload: knox

Post on 07-Jan-2016

29 views

Category:

Documents


2 download

DESCRIPTION

CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models). Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay. Training of the MLP. Multilayer Perceptron (MLP) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

CS621: Artificial IntelligenceLecture 22-23: Sigmoid neuron,

Backpropagation(Lecture 20 and 21 taken by Anup on Graphical

Models)Pushpak Bhattacharyya

Computer Science and Engineering DepartmentIIT Bombay

Page 2: Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Training of the MLP

• Multilayer Perceptron (MLP)

• Question:- How to find weights for the hidden layers when no target output is available?

• Credit assignment problem – to be solved by “Gradient Descent”

Page 3: Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Gradient Descent Technique• Let E be the error at the output layer

• ti = target output; oi = observed output

• i is the index going over n neurons in the outermost layer• j is the index going over the p patterns (1 to p)• Ex: XOR:– p=4 and n=1

p

j

n

ijii otE

1 1

2)(2

1

Page 4: Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Weights in a ff NN• wmn is the weight of the connection

from the nth neuron to the mth neuron

• E vs surface is a complex surface in the space defined by the weights wij

• gives the direction in which a movement of the operating point in the wmn co-ordinate space will result in maximum decrease in error

W

m

n

wmn

mnw

E

mnmn w

Ew

Page 5: Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Sigmoid neurons• Gradient Descent needs a derivative computation

- not possible in perceptron due to the discontinuous step function used!

Sigmoid neurons with easy-to-compute derivatives used!

• Computing power comes from non-linearity of sigmoid function.

xy

xy

as 0

as 1

Page 6: Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Derivative of Sigmoid function

)1(1

11

1

1

)1()(

)1(

1

1

1

22

yyee

e

ee

edx

dy

ey

xx

x

xx

x

x

Page 7: Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Training algorithm

• Initialize weights to random values.• For input x = <xn,xn-1,…,x0>, modify weights as follows

Target output = t, Observed output = o

• Iterate until E < (threshold)

ii w

Ew

2)(2

1otE

Page 8: Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Calculation of ∆wi

ii

ii

i

i

n

iii

ii

xoootw

w

Ew

xooot

W

net

net

o

o

E

xwnetwhereW

net

net

E

W

E

)1()(

)10 constant, learning(

)1()(

:1

0

Page 9: Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Observations

Does the training technique support our intuition?

• The larger the xi, larger is ∆wi – Error burden is borne by the weight values

corresponding to large input values

Page 10: Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Backpropagation on feedforward network

Page 11: Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Backpropagation algorithm

• Fully connected feed forward network• Pure FF network (no jumping of connections

over layers)

Hidden layers

Input layer (n i/p neurons)

Output layer (m o/p neurons)

j

i

wji

….

….

….

….

Page 12: Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Gradient Descent Equations

iji

jji

j

thj

ji

j

jji

jiji

jow

netjw

jnet

E

netw

net

net

E

w

E

w

Ew

)layer j at theinput (

)10 rate, learning(

Page 13: Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Backpropagation – for outermost layer

ijjjjji

jjjj

m

ppp

thj

j

j

jj

ooootw

oootj

otE

netnet

o

o

E

net

Ej

)1()(

))1()(( Hence,

)(2

1

)layer j at theinput (

1

2

Page 14: Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Backpropagation for hidden layers

Hidden layers

Input layer (n i/p neurons)

Output layer (m o/p neurons)

j

i

….

….

….

….

k

k is propagated backwards to find value of j

Page 15: Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Backpropagation – for hidden layers

ijjk

kkj

jjk

kjkj

jjk jk

jjj

j

j

jj

iji

ooow

oow

ooo

netk

net

E

ooo

E

net

o

o

E

net

Ej

jow

)1()(

)1()( Hence,

)1()(

)1(

layernext

layernext

layernext

Page 16: Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

General Backpropagation Rule

ijjk

kkj ooow )1()(layernext

)1()( jjjjj ooot

iji jow • General weight updating rule:

• Where for outermost layer

for hidden layers

Page 17: Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

How does it work?

• Input propagation forward and error propagation backward (e.g. XOR)

w2=1w1=1θ = 0.5

x1x2 x1x2

-1

x1 x2

-11.5

1.5

1 1