introduction to artificial neural networks 主講人 : 虞台文
TRANSCRIPT
Introduction to ArtificialNeural Networks
主講人 : 虞台文
Content Fundamental Concepts of ANNs. Basic Models and Learning Rules
– Neuron Models– ANN structures– Learning
Distributed Representations Conclusions
Introduction to ArtificialNeural Networks
Fundamental Concepts of ANNs
What is ANN? Why ANN?
ANN Artificial Neural Networks– To simulate human brain behavior– A new generation of information
processing system.
Applications
Pattern Matching Pattern Recognition Associate Memory (Content Addressable
Memory) Function Approximation Learning Optimization Vector Quantization Data Clustering . . .
Applications
Pattern Matching Pattern Recognition Associate Memory (Content Addressable
Memory) Function Approximation Learning Optimization Vector Quantization Data Clustering . . .
Traditional Computers are inefficient at these tasks although their computation speed is faster.
Traditional Computers are inefficient at these tasks although their computation speed is faster.
The Configuration of ANNs
An ANN consists of a large number of interconnected processing elements called neurons.– A human brain consists of ~1011 neurons of
many different types.
How ANN works?– Collective behavior.
The Biologic Neuron
The Biologic Neuron
樹狀突樹狀突
軸突軸突二神經原之神經絲接合部分二神經原之神經絲接合部分
The Biologic Neuron
Excitatory or Inhibitory
The Artificial Neuron
x1
x2
xm
wi1
wi2
wim
yi
f (.) a (.)
i
The Artificial Neuron
x1
x2
xm
wi1
wi2
wim
yi
f (.) a (.)
i
ij
m
jiji xwf
1
)( ij
m
jiji xwf
1
)(
)()1( fatyi )()1( fatyi
otherwise
ffa
0
01)(
otherwise
ffa
0
01)(
The Artificial Neuron
x1
x2
xm
wi1
wi2
wim
yi
f (.) a (.)
i
wij
positive excitatorynegative inhibitoryzero no connection
wij
positive excitatorynegative inhibitoryzero no connection
The Artificial Neuron
x1
x2
xm
wi1
wi2
wim
yi
f (.) a (.)
i
Proposed by McCulloch and Pitts [1943]M-P neurons
Proposed by McCulloch and Pitts [1943]M-P neurons
What can be done by M-P neurons?
A hard limiter. A binary threshold unit. Hyperspace separation.
otherwise
fy
xwxwf
i
i
0
0)(1
)( 2211
otherwise
fy
xwxwf
i
i
0
0)(1
)( 2211
w 1 x 1 + w 2 x 2 =
x1
x2
x1 x2
y
w1 w2 10
What ANNs will be?
ANN A neurally inspired mathematical model. Consists a large number of highly interconnected PE
s. Its connections (weights) holds knowledge. The response of PE depends only on local informatio
n. Its collective behavior demonstrates the computatio
n power. With learning, recalling and, generalization capabilit
y.
Three Basic Entities of ANN Models
Models of Neurons or PEs.Models of synaptic interconnections and stru
ctures. Training or learning rules.
Introduction to ArtificialNeural Networks
Basic Models and Learning Rules Neuron Models ANN structures Learning
Processing Elements
f (.) a (.)
i
What integration functions we may have?
What integration functions we may have?
What activation functions we may have?
What activation functions we may have?
Extensions of M-P neurons
Integration Functions
f (.) a (.)
iij
m
jiji xwf
2
1
ij
m
jiji xwf
2
1
Quadratic Function
i
m
jijji wxf
1
2)( i
m
jijji wxf
1
2)(Spherical Function
1 1
j k
m m
i ijk j k j k ij k
f w x x x x
1 1
j k
m m
i ijk j k j k ij k
f w x x x x
Polynomial Function
ij
m
jijii xwnetf
1
ij
m
jijii xwnetf
1M-P neuron
Activation Functions
f (.) a (.)
i
M-P neuron: (Step function)
otherwise
ffa
0
01)(
otherwise
ffa
0
01)(
1
a
f
Activation Functions
f (.) a (.)
i
Hard Limiter (Threshold function)
01
01)sgn()(
f
fffa
01
01)sgn()(
f
fffa
1
a
1 f
Activation Functions
f (.) a (.)
i
Ramp function:
00
10
11
)(
f
ff
f
fa
00
10
11
)(
f
ff
f
fa
1
a
1 f
Activation Functions
f (.) a (.)
i
Unipolar sigmoid function:
0
0.5
1
1.5
-4 -3 -2 -1 0 1 2 3 4
fefa
1
1)( fe
fa
1
1)(
Activation Functions
f (.) a (.)
i
Bipolar sigmoid function:
11
2)(
fefa
11
2)(
fefa
-1.5
-1
-0.5
0
0.5
1
1.5
-4 -3 -2 -1 0 1 2 3 4
x
y
Example: Activation Surfaces
L1
L2
L3
x y
L1 L2 L3
x
y L1
L2
L3
Example: Activation Surfaces
x1=0
y1=0
xy+4=0
x y
L1 L2 L3
10
1=1
0 1
2=1
11
3= 4
Example: Activation Surfaces
x y
L1 L2 L3
x
y L1
L2
L3
011
001 101 100
110
010
111
Region Code
x
y L1
L2
L3
Example: Activation Surfaces
z=1
z=0 L4
z
x y
L1 L2 L3
x
y L1
L2
L3
Example: Activation Surfaces
z=1
z=0 L4
z
x y
L1 L2 L3
1
4=2.5
1 1
Example: Activation Surfaces
L4
z
x y
L1 L2 L3
M-P neuron: (Step function)
otherwise
ffa
0
01)(
otherwise
ffa
0
01)(
Example: Activation Surfaces
L4
z
x y
L1 L2 L3
=2 =3
=5 =10
Unipolar sigmoid function: fe
fa
1
1)( fe
fa
1
1)(
Introduction to ArtificialNeural Networks
Basic Models and Learning Rules Neuron Models ANN structures Learning
ANN Structure (Connections)
Single-Layer Feedforward Networks
y1 y2 yn
x1 x2 xm
w11 w12
w1mw21 w22
w2m wn1 wnmwn2
. . .
Multilayer Feedforward Networks
. . .
. . .
. . .
. . .
x1 x2 xm
y1 y2 yn
Hidden Layer
Input Layer
Output Layer
Multilayer Feedforward Networks
Pattern Recognition
Input
Analysis
ClassificationOutput
Learning
Where the knowledge
from?
Single Node with Feedback to Itself
FeedbackLoop
Single-Layer Recurrent Networks
. . .
x1 x2 xm
y1 y2 yn
Multilayer Recurrent Networks
x1 x2 x3
y1 y2 y3
. . .
. . .
Introduction to ArtificialNeural Networks
Basic Models and Learning Rules Neuron Models ANN structures Learning
Learning
Consider an ANN with n neurons and each with m adaptive weights.
Weight matrix:
nmnn
m
m
Tn
T
T
www
www
www
21
22221
11211
2
1
w
w
w
W
Learning
Consider an ANN with n neurons and each with m adaptive weights.
Weight matrix:
nmnn
m
m
Tn
T
T
www
www
www
21
22221
11211
2
1
w
w
w
W
To “Learn” the weight matrix.To “Learn” the weight matrix.
How?
Learning Rules
Supervised learning
Reinforcement learning
Unsupervised learning
Supervised Learning
Learning with a teacher
Learning by examples Training set
(1) (2)(1) (2 )) ( )(( , ), ( , ), , ( , ),kk d d dx xT x
Supervised Learning
x
Errorsignal
Generator
Errorsignal
Generator
d
yANN
W
(1) (2)(1) (2 )) ( )(( , ), ( , ), , ( , ),kk d d dx xT x
Reinforcement Learning
Learning with a criticLearning by comments
Reinforcement Learning
x
Criticsignal
Generator
Criticsignal
Generator
yANN
WReinforcement
Signal
Unsupervised Learning
Self-organizingClustering
– Form proper clusters by discovering the similarities and dissimilarities among objects.
Unsupervised Learning
x yANN
W
The General Weight Learning Rule
1
1
m
i ijijj
net xw
Input:
Output: ( )i iy a net
i
.
.
.
.
.
.
wi1
wi2
wij
wi,m-1
x1
x2
xj
xm-1
yi
i
The General Weight Learning Rule
1
1
m
i ijijj
net xw
Input:
Output: ( )i iy a net
i
.
.
.
.
.
.
wi1
wi2
wij
wi,m-1
x1
x2
xj
xm-1
yi
i
We want to learn the weights & bias.
We want to learn the weights & bias.
The General Weight Learning Rule
1
1
m
i ijijj
net xw
Input:
i
.
.
.
.
.
.
wi1
wi2
wij
wi,m-1
x1
x2
xj
xm-1
i
1ij
m
i jj
net xw
Let xm = 1 and wim = i.
The General Weight Learning Rule
1
1
m
i ijijj
net xw
Input:
i
.
.
.
.
.
.
wi1
wi2
wij
wi,m-1
x1
x2
xj
xm-1
1ij
m
i jj
net xw
Let xm = 1 and wim = i.
xm= 1wim=i
The General Weight Learning Rule
Input:
i
.
.
.
.
.
.
wi1
wi2
wij
wi,m-1
x1
x2
xj
xm-1
1ij
m
i jj
net xw
xm= 1wim=i
yi
wi=(wi1, wi2 ,…,wim)Twi=(wi1, wi2 ,…,wim)T
wi(t) = ?wi(t) = ?
We wantto learn
The General Weight Learning Rule
wiwix yi
r diLearningSignal
Generator
LearningSignal
Generator
The General Weight Learning Rule
wiwix yi
r diLearningSignal
Generator
LearningSignal
Generator
( , , )r i if dw x
The General Weight Learning Rule
wiwix yi
r diLearningSignal
Generator
LearningSignal
Generator
)()( trti xw )()( trti xw
( , , )r i if dw x
The General Weight Learning Rule
wiwix yi
r diLearningSignal
Generator
LearningSignal
Generator
( ) ( )i t tr w x( ) ( )i t tr w x
)()( trti xw )()( trti xw
Learning Rate
( , , )r i if dw x
The General Weight Learning Rule
wi=(wi1, wi2 ,…,wim)Twi=(wi1, wi2 ,…,wim)TWe wantto learn
( , , )r i ir f d w x( , , )r i ir f d w x( ) ( )i t tr w x( ) ( )i t tr w x
( 1) ( ) ( ) ( ) ( ) ( )( , , )t t t t t ti i r i if d w w w x x
( 1) ( ) ( ) ( ) ( ) ( )( , , )t t t t t ti i r i if d w w w x x
Discrete-Time Weight Modification Rule:
Continuous-Time Weight Modification Rule:
( )( )id t
r tdt
wx
( )( )id t
r tdt
wx
Hebb’s Learning Law
• Hebb [1994] hypothesis that when an axonal input from A to B causes neuron B to immediately emit a pulse (fire) and this situation happens repeatedly or persistently.
• Then, the efficacy of that axonal input, in terms of ability to help neuron B to fire in future, is somehow increased.
• Hebb’s learning rule is a unsupervised learning rule.
Hebb’s Learning Law
( , , ) ( )Tr i i iif d ar y w x w x( , , ) ( )Tr i i iif d ar y w x w x
( ) ( ) ii rt t y w x x( ) ( ) ii rt t y w x x
iij jw xy iij jw xy
+
+
Introduction to ArtificialNeural Networks
Distributed Representations
Distributed Representations
• Distributed Representation:– An entity is represented by a pattern of a
ctivity distributed over many PEs.– Each Processing element is involved in r
epresenting many different entities.
• Local Representation:– Each entity is represented by one PE.
Example
P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
+ _ + + _ _ _ _ + + + + + _ _ _
+ _ + + _ _ _ _ + _ + _ + + _ +
+ + _ + _ + + _ + _ _ + + + + _
Dog
Cat
Bread
P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
+ _ + + _ _ _ _ + + + + + _ _ _
+ _ + + _ _ _ _ + _ + _ + + _ +
+ + _ + _ + + _ + _ _ + + + + _
Dog
Cat
Bread
Advantages
Act as a content addressable memory.
P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
+ + + +
What is this?What is this?
Advantages
P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
+ _ + + _ _ _ _ + + + + + _ _ _
+ _ + + _ _ _ _ + _ + _ + + _ +
+ + _ + _ + + _ + _ _ + + + + _
Dog
Cat
Bread
Act as a content addressable memory.
Make induction easy.
P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
+ _ _ + _ _ _ _ + + + + + + _ _Fido
Dog has 4 legs? How many for Fido?Dog has 4 legs? How many for Fido?
Advantages
P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
+ _ + + _ _ _ _ + + + + + _ _ _
+ _ + + _ _ _ _ + _ + _ + + _ +
+ + _ + _ + + _ + _ _ + + + + _
Dog
Cat
Bread
Act as a content addressable memory. Make induction easy. Make the creation of new entities or
concept easy (without allocation of new hardware).
+ + _ _ _ + + _ + _ _ _ + + + _Doughnut
Add doughnut by changing weights.Add doughnut by changing weights.
Advantages
P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
+ _ + + _ _ _ _ + + + + + _ _ _
+ _ + + _ _ _ _ + _ + _ + + _ +
+ + _ + _ + + _ + _ _ + + + + _
Dog
Cat
Bread
Act as a content addressable memory. Make induction easy. Make the creation of new entities or concept
easy (without allocation of new hardware). Fault Tolerance.
Some PEs break down don’t cause problem.Some PEs break down don’t cause problem.
Disadvantages
• How to understand?• How to modify?
Learning procedures are required.Learning procedures are required.