artificial neural network

Artificial Neural Networks

Ildar Nurgaliev

Warren McCulloch and Walter Pitts (1943) created a

computational model for neural networks based on

mathematics and algorithms. They called this model

threshold logic.

Neural networks, as used in artificial intelligence, have

traditionally been viewed as simplified models of neural

processing in the brain.

Introduction

http://en.wikipedia.org/wiki/Mathematics

Biological Network and Neural System

Perceptron

Bias: Activation Weight

Perceptron and Boolean Functions

x1 x2 x1 ⋀ x2

-1 -1

-1 1

1 -1

1 1

-1

-1

-1

1

sign(w0 + w1x1 + w2x2) = x1 ⋀ x2

Conjunction


x1 x2 x1 ⋀ x2

-1 -1

-1 1

1 -1

1 1

-1

-1

-1

1

sign(w0 + w1x1 + w2x2) = x1 ⋀ x2

w0 = -1

w1 = 1

w2 = 1

Conjunction


x1 x2 x1 ⋀ x2

-1 -1

-1 1

1 -1

1 1

-1

1

1

1

sign(w0 + w1x1 + w2x2) = x1 ⋁ x2

Disjunction


x1 x2 x1 ⋀ x2

-1 -1

-1 1

1 -1

1 1

-1

1

1

1

sign(w0 + w1x1 + w2x2) = x1 ⋁ x2

Disjunction

w0 = 1

w1 = 1

w2 = 1

Geometric Interpretation

w0

Perceptron Learning

Ensemble-Teacher Learning

Perceptron

Perceptron Learning


Perceptron

x1 x2 x1 ⋀ x2

-1 -1

-1 1

1 -1

1 1

-1

-1

-1

1

sign(w0 + w1x1 + w2x2) = x1 ⋀ x2


1

1

-1

1

1

1

0.5

0.5

0.5

-1

1


1

1

-1

1

1

1

0.5

0.5

0.5

-1

1 -1


1

1

-1

1

1

1

0.5

0.5

0.5

-1

1 -1

↓↓

↓

↓

↓↓

↑


1

1

-1

1

1

1

0.5

0.5

0.5

-1

1 -1

↓↓

↓

↓

↓↓

↑

Right ans: a

Net ans : y

Direction of learning

d = a - y = -2

Change weight

Δwi = εdxi|wi|


1

1

-1

1

1

0.5

0.2

0.7

0

-1.5

11 -1

Right ans: a

Net ans : y

Direction of learning

d = a - y = -2

Change weight

Δwi = εdxi|wi|

XOR-function

Doesn’t work?

x1 x2 x1 ⊕ x2

-1 -1

-1 1

1 -1

1 1

-1

1

1

-1

XOR-function

Doesn’t work?

x1 x2 x1 ⊕ x2

-1 -1

-1 1

1 -1

1 1

-1

1

1

-1

Solution

Multilayer Perceptron

Multilayer Perceptron

Input layer Hidden layer Output layer

Learning as a Function MinimizationGiven:

X=(X1...Xk) input vectors, Xi∈Rn

A=(A1...Ak) correct output vectors, Ai∈Rm

(X,A) learning set

W vector which contains all weights

N(W,X) neuron network’s function

Y = N(W,X) neuron network’s response Y∈Rm

D(Y,A) = ∑mj=1(Y[j]-A[j])2 error function

D(Yi)=D(Y,Ai) error function on i-th example

Ei(W)=Di(N(W,Xi)) network’s error on i-th example

E(W)=∑ki=1Ei(W) network’s error in whole set

Goal:

Find vector W such that E(W)➝min (learning in the whole set)

Find vector W such that Ei(W)➝min (learning in the particular example)

Gradient Descent Method

Gradient Descent Method

Algorithm for single variable Algorithm for GDM

1. Initialize x1 with random value from

R

2. i=1

3. xi+1 = xi-દf’(xi)

4. i++

5. if f(xi ) - f(xi+1) > c goto 3

1. Initialize W1 with random value

from Rn

2. i=1

3. Wi+1 = Wi-દ▽f(Wi)

4. W++

5. if f(Wi ) - f(Wi+1 )> c goto 3

Backpropagation Method

Backpropagation MethodGiven:

X=(X1...Xk) input vectors, Xi∈Rn

A=(A1...Ak) correct output vectors, Ai∈Rm

(X,A) learning set

W vector which contains all weights

N(W,X) neuron network’s function

Y = N(W,X) neuron network’s response Y∈Rm

D(Y,A) = ∑mj=1(Y[j]-A[j])2 error function

D(Yi)=D(Y,Ai) error function on i-th example

Ei(W)=Di(N(W,Xi)) network’s error on i-th example

E(W)=∑ki=1Ei(W) network’s error in whole set

Goal:

Find vector W such that E(W)➝min (learning in the whole set)

Find vector W such that Ei(W)➝min (learning in the particular example)

Backpropagation MethodDk(y1,y2)=(y1- a1)

2 + (y2- a2)2 Goal: decrease function, using gradient

descent in order increase accuracy of

function.


2 + (y2- a2)2

= 2(y1- a1) = 2(y2- a2)

Calculate partial derivatives


2 + (y2- a2)2

= 2(y1- a1) = 2(y2- a2)

y1=y1(w01,w11,w21) = f( sssssssssss )

But y1 is also a function. Let’s consider it as function of weights.

Now we are able to calculate its partial derivatives.


2 + (y2- a2)2

= 2(y1- a1) = 2(y1- a1)

y1=y1(w01,w11,w21) = f( ssssssssss )

= f’(S1) x2

= 2(y1- a1) = 2(y2- a2)

For example


2 + (y2- a2)2

= 2(y1- a1) = 2(y2- a2)

y1=y1(w01,w11,w21) = f( ssssssssss )

= f’(S1) x2 = 0

Ek(W)=Dk(y1(w01, w11, w21 ), y2 (w02, w12, w22 ))


And now we are able to calculate

partial derivative to function Ek on

each weight.

Backpropagation MethodDk(y1,...,yn) = (y1- an)

2 +... + (yn- an )2

Same actions in the general case.


2 +... + (yn- an )2

yi = f(Si )

Now calculate functions Si yi and each derivative on wji.


2 +... + (yn- an )2

yi = f(Si )


2 +... + (yn- an )2

yi = f(Si )

Formula to calculate derivative of Ek function on each weight


yi = yi(x1 , … , xm ) xj = xj(v0j , … , vrj )

If would be that Dk= Dk(x1 ,.., xm ) it means that


yi = yi(x1 , … , xm ) xj = xj(v0j , … , vrj )

If would be that Dk= Dk(x1 ,.., xm ) it means that

We don’t know only that, so let calculate it!


2 + (y2- a2)2

= 2(y1- a1) = 2(y2- a2)


Now let’s consider f function as a function of xi and

calculate its derivative


2 + (y2- a2)2

= 2(y1- a1) = 2(y2- a2)


Now we are able to calculate

derivative of Dk function of x1


Now do the same actions in the general case.

Offline Learning

Train the ANN

on example set

Use the ANN

on the real data

But what if the real data is not i.i.d.?

Online learning

1. Receive an instance

2. Predict the outcome

3. Obtain the real outcome

Learn one instance at a time

However, in practice it is not always possible

to obtain the real outcome

Spiking Neural Network

● Third-generation of ANN models

● Adds the concept of time to the neuron

● One SNN neuron can replace hundreds of

hidden neurons in conventional ANN models

● Requires huge computational power

TrueNorth by IBM

1 million neurons

256 million

synapses

Power: <100 mW

artificial neural network

Science