artificial neural network

Post on 15-Jul-2015

135 Views

Category:

Science

10 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Artificial Neural Networks

Ildar Nurgaliev

Warren McCulloch and Walter Pitts (1943) created a

computational model for neural networks based on

mathematics and algorithms. They called this model

threshold logic.

Neural networks, as used in artificial intelligence, have

traditionally been viewed as simplified models of neural

processing in the brain.

Introduction

Biological Network and Neural System

Biological Network and Neural System

Biological Network and Neural System

Perceptron

Perceptron

Perceptron

Perceptron

Bias: Activation Weight

Perceptron and Boolean Functions

x1 x2 x1 ⋀ x2

-1 -1

-1 1

1 -1

1 1

-1

-1

-1

1

sign(w0 + w1x1 + w2x2) = x1 ⋀ x2

Conjunction

Perceptron and Boolean Functions

x1 x2 x1 ⋀ x2

-1 -1

-1 1

1 -1

1 1

-1

-1

-1

1

sign(w0 + w1x1 + w2x2) = x1 ⋀ x2

Conjunction

Perceptron and Boolean Functions

x1 x2 x1 ⋀ x2

-1 -1

-1 1

1 -1

1 1

-1

-1

-1

1

sign(w0 + w1x1 + w2x2) = x1 ⋀ x2

w0 = -1

w1 = 1

w2 = 1

Conjunction

Perceptron and Boolean Functions

x1 x2 x1 ⋀ x2

-1 -1

-1 1

1 -1

1 1

-1

1

1

1

sign(w0 + w1x1 + w2x2) = x1 ⋁ x2

Disjunction

Perceptron and Boolean Functions

x1 x2 x1 ⋀ x2

-1 -1

-1 1

1 -1

1 1

-1

1

1

1

sign(w0 + w1x1 + w2x2) = x1 ⋁ x2

Disjunction

w0 = 1

w1 = 1

w2 = 1

Geometric Interpretation

w0

Geometric Interpretation

w0

Perceptron Learning

Ensemble-Teacher Learning

Perceptron

Perceptron Learning

Ensemble-Teacher Learning

Perceptron

x1 x2 x1 ⋀ x2

-1 -1

-1 1

1 -1

1 1

-1

-1

-1

1

sign(w0 + w1x1 + w2x2) = x1 ⋀ x2

Ensemble-Teacher Learning

1

1

-1

1

1

1

0.5

0.5

0.5

-1

1

Ensemble-Teacher Learning

1

1

-1

1

1

1

0.5

0.5

0.5

-1

1 -1

Ensemble-Teacher Learning

1

1

-1

1

1

1

0.5

0.5

0.5

-1

1 -1

↓↓

↓↓

Ensemble-Teacher Learning

1

1

-1

1

1

1

0.5

0.5

0.5

-1

1 -1

↓↓

↓↓

Right ans: a

Net ans : y

Direction of learning

d = a - y = -2

Change weight

Δwi = εdxi|wi|

Ensemble-Teacher Learning

1

1

-1

1

1

0.5

0.2

0.7

0

-1.5

11 -1

Right ans: a

Net ans : y

Direction of learning

d = a - y = -2

Change weight

Δwi = εdxi|wi|

XOR-function

Doesn’t work?

x1 x2 x1 ⊕ x2

-1 -1

-1 1

1 -1

1 1

-1

1

1

-1

XOR-function

Doesn’t work?

x1 x2 x1 ⊕ x2

-1 -1

-1 1

1 -1

1 1

-1

1

1

-1

Solution

Multilayer Perceptron

Multilayer Perceptron

Input layer Hidden layer Output layer

Learning as a Function MinimizationGiven:

X=(X1...Xk) input vectors, Xi∈Rn

A=(A1...Ak) correct output vectors, Ai∈Rm

(X,A) learning set

W vector which contains all weights

N(W,X) neuron network’s function

Y = N(W,X) neuron network’s response Y∈Rm

D(Y,A) = ∑mj=1(Y[j]-A[j])2 error function

D(Yi)=D(Y,Ai) error function on i-th example

Ei(W)=Di(N(W,Xi)) network’s error on i-th example

E(W)=∑ki=1Ei(W) network’s error in whole set

Goal:

Find vector W such that E(W)➝min (learning in the whole set)

Find vector W such that Ei(W)➝min (learning in the particular example)

Gradient Descent Method

Gradient Descent Method

Algorithm for single variable Algorithm for GDM

1. Initialize x1 with random value from

R

2. i=1

3. xi+1 = xi-દf’(xi)

4. i++

5. if f(xi ) - f(xi+1) > c goto 3

1. Initialize W1 with random value

from Rn

2. i=1

3. Wi+1 = Wi-દ▽f(Wi)

4. W++

5. if f(Wi ) - f(Wi+1 )> c goto 3

Backpropagation Method

Backpropagation MethodGiven:

X=(X1...Xk) input vectors, Xi∈Rn

A=(A1...Ak) correct output vectors, Ai∈Rm

(X,A) learning set

W vector which contains all weights

N(W,X) neuron network’s function

Y = N(W,X) neuron network’s response Y∈Rm

D(Y,A) = ∑mj=1(Y[j]-A[j])2 error function

D(Yi)=D(Y,Ai) error function on i-th example

Ei(W)=Di(N(W,Xi)) network’s error on i-th example

E(W)=∑ki=1Ei(W) network’s error in whole set

Goal:

Find vector W such that E(W)➝min (learning in the whole set)

Find vector W such that Ei(W)➝min (learning in the particular example)

Backpropagation MethodDk(y1,y2)=(y1- a1)

2 + (y2- a2)2 Goal: decrease function, using gradient

descent in order increase accuracy of

function.

Backpropagation MethodDk(y1,y2)=(y1- a1)

2 + (y2- a2)2

= 2(y1- a1) = 2(y2- a2)

Calculate partial derivatives

Backpropagation MethodDk(y1,y2)=(y1- a1)

2 + (y2- a2)2

= 2(y1- a1) = 2(y2- a2)

y1=y1(w01,w11,w21) = f( sssssssssss )

But y1 is also a function. Let’s consider it as function of weights.

Now we are able to calculate its partial derivatives.

Backpropagation MethodDk(y1,y2)=(y1- a1)

2 + (y2- a2)2

= 2(y1- a1) = 2(y1- a1)

y1=y1(w01,w11,w21) = f( ssssssssss )

= f’(S1) x2

= 2(y1- a1) = 2(y2- a2)

For example

Backpropagation MethodDk(y1,y2)=(y1- a1)

2 + (y2- a2)2

= 2(y1- a1) = 2(y2- a2)

y1=y1(w01,w11,w21) = f( ssssssssss )

= f’(S1) x2 = 0

Ek(W)=Dk(y1(w01, w11, w21 ), y2 (w02, w12, w22 ))

y2=y2(w02,w12,w22) = f( sssssssssss )

And now we are able to calculate

partial derivative to function Ek on

each weight.

Backpropagation MethodDk(y1,...,yn) = (y1- an)

2 +... + (yn- an )2

Same actions in the general case.

Backpropagation MethodDk(y1,...,yn) = (y1- an)

2 +... + (yn- an )2

yi = f(Si )

Now calculate functions Si yi and each derivative on wji.

Backpropagation MethodDk(y1,...,yn) = (y1- an)

2 +... + (yn- an )2

yi = f(Si )

Backpropagation MethodDk(y1,...,yn) = (y1- an)

2 +... + (yn- an )2

yi = f(Si )

Formula to calculate derivative of Ek function on each weight

Backpropagation Method

yi = yi(x1 , … , xm ) xj = xj(v0j , … , vrj )

If would be that Dk= Dk(x1 ,.., xm ) it means that

Backpropagation Method

yi = yi(x1 , … , xm ) xj = xj(v0j , … , vrj )

If would be that Dk= Dk(x1 ,.., xm ) it means that

We don’t know only that, so let calculate it!

Backpropagation MethodDk(y1,y2)=(y1- a1)

2 + (y2- a2)2

= 2(y1- a1) = 2(y2- a2)

y1=y1(w01,w11,w21) = f( sssssssssss )

Now let’s consider f function as a function of xi and

calculate its derivative

Backpropagation MethodDk(y1,y2)=(y1- a1)

2 + (y2- a2)2

= 2(y1- a1) = 2(y2- a2)

y2=y2(w02,w12,w22) = f( sssssssssss )

Now let’s consider f function as a function of xi and

calculate its derivative

Backpropagation MethodDk(y1,y2)=(y1- a1)

2 + (y2- a2)2

= 2(y1- a1) = 2(y2- a2)

y1=y1(w01,w11,w21) = f( sssssssssss )

Now we are able to calculate

derivative of Dk function of x1

Backpropagation Method

Now do the same actions in the general case.

Backpropagation Method

Offline Learning

Train the ANN

on example set

Use the ANN

on the real data

But what if the real data is not i.i.d.?

Online learning

1. Receive an instance

2. Predict the outcome

3. Obtain the real outcome

Learn one instance at a time

However, in practice it is not always possible

to obtain the real outcome

Spiking Neural Network

● Third-generation of ANN models

● Adds the concept of time to the neuron

● One SNN neuron can replace hundreds of

hidden neurons in conventional ANN models

● Requires huge computational power

TrueNorth by IBM

1 million neurons

256 million

synapses

Power: <100 mW

top related