artificial neural network
TRANSCRIPT
Artificial Neural Networks
Ildar Nurgaliev
Warren McCulloch and Walter Pitts (1943) created a
computational model for neural networks based on
mathematics and algorithms. They called this model
threshold logic.
Neural networks, as used in artificial intelligence, have
traditionally been viewed as simplified models of neural
processing in the brain.
Introduction
Biological Network and Neural System
Biological Network and Neural System
Biological Network and Neural System
Perceptron
Perceptron
Perceptron
Perceptron
Bias: Activation Weight
Perceptron and Boolean Functions
x1 x2 x1 ⋀ x2
-1 -1
-1 1
1 -1
1 1
-1
-1
-1
1
sign(w0 + w1x1 + w2x2) = x1 ⋀ x2
Conjunction
Perceptron and Boolean Functions
x1 x2 x1 ⋀ x2
-1 -1
-1 1
1 -1
1 1
-1
-1
-1
1
sign(w0 + w1x1 + w2x2) = x1 ⋀ x2
Conjunction
Perceptron and Boolean Functions
x1 x2 x1 ⋀ x2
-1 -1
-1 1
1 -1
1 1
-1
-1
-1
1
sign(w0 + w1x1 + w2x2) = x1 ⋀ x2
w0 = -1
w1 = 1
w2 = 1
Conjunction
Perceptron and Boolean Functions
x1 x2 x1 ⋀ x2
-1 -1
-1 1
1 -1
1 1
-1
1
1
1
sign(w0 + w1x1 + w2x2) = x1 ⋁ x2
Disjunction
Perceptron and Boolean Functions
x1 x2 x1 ⋀ x2
-1 -1
-1 1
1 -1
1 1
-1
1
1
1
sign(w0 + w1x1 + w2x2) = x1 ⋁ x2
Disjunction
w0 = 1
w1 = 1
w2 = 1
Geometric Interpretation
w0
Geometric Interpretation
w0
Perceptron Learning
Ensemble-Teacher Learning
Perceptron
Perceptron Learning
Ensemble-Teacher Learning
Perceptron
x1 x2 x1 ⋀ x2
-1 -1
-1 1
1 -1
1 1
-1
-1
-1
1
sign(w0 + w1x1 + w2x2) = x1 ⋀ x2
Ensemble-Teacher Learning
1
1
-1
1
1
1
0.5
0.5
0.5
-1
1
Ensemble-Teacher Learning
1
1
-1
1
1
1
0.5
0.5
0.5
-1
1 -1
Ensemble-Teacher Learning
1
1
-1
1
1
1
0.5
0.5
0.5
-1
1 -1
↓↓
↓
↓
↓↓
↑
Ensemble-Teacher Learning
1
1
-1
1
1
1
0.5
0.5
0.5
-1
1 -1
↓↓
↓
↓
↓↓
↑
Right ans: a
Net ans : y
Direction of learning
d = a - y = -2
Change weight
Δwi = εdxi|wi|
Ensemble-Teacher Learning
1
1
-1
1
1
0.5
0.2
0.7
0
-1.5
11 -1
Right ans: a
Net ans : y
Direction of learning
d = a - y = -2
Change weight
Δwi = εdxi|wi|
XOR-function
Doesn’t work?
x1 x2 x1 ⊕ x2
-1 -1
-1 1
1 -1
1 1
-1
1
1
-1
XOR-function
Doesn’t work?
x1 x2 x1 ⊕ x2
-1 -1
-1 1
1 -1
1 1
-1
1
1
-1
Solution
Multilayer Perceptron
Multilayer Perceptron
Input layer Hidden layer Output layer
Learning as a Function MinimizationGiven:
X=(X1...Xk) input vectors, Xi∈Rn
A=(A1...Ak) correct output vectors, Ai∈Rm
(X,A) learning set
W vector which contains all weights
N(W,X) neuron network’s function
Y = N(W,X) neuron network’s response Y∈Rm
D(Y,A) = ∑mj=1(Y[j]-A[j])2 error function
D(Yi)=D(Y,Ai) error function on i-th example
Ei(W)=Di(N(W,Xi)) network’s error on i-th example
E(W)=∑ki=1Ei(W) network’s error in whole set
Goal:
Find vector W such that E(W)➝min (learning in the whole set)
Find vector W such that Ei(W)➝min (learning in the particular example)
Gradient Descent Method
Gradient Descent Method
Algorithm for single variable Algorithm for GDM
1. Initialize x1 with random value from
R
2. i=1
3. xi+1 = xi-દf’(xi)
4. i++
5. if f(xi ) - f(xi+1) > c goto 3
1. Initialize W1 with random value
from Rn
2. i=1
3. Wi+1 = Wi-દ▽f(Wi)
4. W++
5. if f(Wi ) - f(Wi+1 )> c goto 3
Backpropagation Method
Backpropagation MethodGiven:
X=(X1...Xk) input vectors, Xi∈Rn
A=(A1...Ak) correct output vectors, Ai∈Rm
(X,A) learning set
W vector which contains all weights
N(W,X) neuron network’s function
Y = N(W,X) neuron network’s response Y∈Rm
D(Y,A) = ∑mj=1(Y[j]-A[j])2 error function
D(Yi)=D(Y,Ai) error function on i-th example
Ei(W)=Di(N(W,Xi)) network’s error on i-th example
E(W)=∑ki=1Ei(W) network’s error in whole set
Goal:
Find vector W such that E(W)➝min (learning in the whole set)
Find vector W such that Ei(W)➝min (learning in the particular example)
Backpropagation MethodDk(y1,y2)=(y1- a1)
2 + (y2- a2)2 Goal: decrease function, using gradient
descent in order increase accuracy of
function.
Backpropagation MethodDk(y1,y2)=(y1- a1)
2 + (y2- a2)2
= 2(y1- a1) = 2(y2- a2)
Calculate partial derivatives
Backpropagation MethodDk(y1,y2)=(y1- a1)
2 + (y2- a2)2
= 2(y1- a1) = 2(y2- a2)
y1=y1(w01,w11,w21) = f( sssssssssss )
But y1 is also a function. Let’s consider it as function of weights.
Now we are able to calculate its partial derivatives.
Backpropagation MethodDk(y1,y2)=(y1- a1)
2 + (y2- a2)2
= 2(y1- a1) = 2(y1- a1)
y1=y1(w01,w11,w21) = f( ssssssssss )
= f’(S1) x2
= 2(y1- a1) = 2(y2- a2)
For example
Backpropagation MethodDk(y1,y2)=(y1- a1)
2 + (y2- a2)2
= 2(y1- a1) = 2(y2- a2)
y1=y1(w01,w11,w21) = f( ssssssssss )
= f’(S1) x2 = 0
Ek(W)=Dk(y1(w01, w11, w21 ), y2 (w02, w12, w22 ))
y2=y2(w02,w12,w22) = f( sssssssssss )
And now we are able to calculate
partial derivative to function Ek on
each weight.
Backpropagation MethodDk(y1,...,yn) = (y1- an)
2 +... + (yn- an )2
Same actions in the general case.
Backpropagation MethodDk(y1,...,yn) = (y1- an)
2 +... + (yn- an )2
yi = f(Si )
Now calculate functions Si yi and each derivative on wji.
Backpropagation MethodDk(y1,...,yn) = (y1- an)
2 +... + (yn- an )2
yi = f(Si )
Backpropagation MethodDk(y1,...,yn) = (y1- an)
2 +... + (yn- an )2
yi = f(Si )
Formula to calculate derivative of Ek function on each weight
Backpropagation Method
yi = yi(x1 , … , xm ) xj = xj(v0j , … , vrj )
If would be that Dk= Dk(x1 ,.., xm ) it means that
Backpropagation Method
yi = yi(x1 , … , xm ) xj = xj(v0j , … , vrj )
If would be that Dk= Dk(x1 ,.., xm ) it means that
We don’t know only that, so let calculate it!
Backpropagation MethodDk(y1,y2)=(y1- a1)
2 + (y2- a2)2
= 2(y1- a1) = 2(y2- a2)
y1=y1(w01,w11,w21) = f( sssssssssss )
Now let’s consider f function as a function of xi and
calculate its derivative
Backpropagation MethodDk(y1,y2)=(y1- a1)
2 + (y2- a2)2
= 2(y1- a1) = 2(y2- a2)
y2=y2(w02,w12,w22) = f( sssssssssss )
Now let’s consider f function as a function of xi and
calculate its derivative
Backpropagation MethodDk(y1,y2)=(y1- a1)
2 + (y2- a2)2
= 2(y1- a1) = 2(y2- a2)
y1=y1(w01,w11,w21) = f( sssssssssss )
Now we are able to calculate
derivative of Dk function of x1
Backpropagation Method
Now do the same actions in the general case.
Backpropagation Method
Offline Learning
Train the ANN
on example set
Use the ANN
on the real data
But what if the real data is not i.i.d.?
Online learning
1. Receive an instance
2. Predict the outcome
3. Obtain the real outcome
Learn one instance at a time
However, in practice it is not always possible
to obtain the real outcome
Spiking Neural Network
● Third-generation of ANN models
● Adds the concept of time to the neuron
● One SNN neuron can replace hundreds of
hidden neurons in conventional ANN models
● Requires huge computational power
TrueNorth by IBM
1 million neurons
256 million
synapses
Power: <100 mW