ismp lab 新生訓練課程 artificial neural networks 類神經網路
DESCRIPTION
National Cheng Kung University/ Walsin Lihwa Corp. 「Center for Research of E-life DIgital Technology」 成功大學/華新麗華「數位生活科技研究中心」. ISMP Lab 新生訓練課程 Artificial Neural Networks 類神經網路. 指導教授:郭耀煌 教授 碩士 班學生 : 黃盛裕 96 級 2008/7/18. Outline. Introduction Single Layer Perceptron – Perceptron - PowerPoint PPT PresentationTRANSCRIPT
ISMP Lab 新生訓練課程 Artificial Neural Networks 類神經網路
指導教授:郭耀煌 教授碩士班學生 :黃盛裕 96級
2008/7/18
National Cheng Kung University/Walsin Lihwa Corp.「 Center for Research of E-life DIgital Technology」成功大學 / 華新麗華「數位生活科技研究中心」
OutlineIntroduction
Single Layer Perceptron – Perceptron
Example
Single Layer Perceptron – Adaline
Multilayer Perceptron – Back–propagation neural network
Competitive Learning - Example
Radial Basis Function (RBF) Networks
Q&A and Homework
3
Artificial Neural Networks (ANN)
Artificial Neural Networks simulate human brain approximate any nonlinear and complex functions
accuracy
Fig.1Fig.2
4
Neural Networks vs. Computer
processing elements
element size
energy use
processing speed
style of computation
fault tolerant learns
intelligent,
conscious
1014 synapses 10-6 m 30 W 100 Hz parallel, distributed yes yes usually
108 transistors 10-6 m 30 W
(CPU) 109 Hz serial, centralized no a little not (yet)
Table 1
Biological neural networksFig.3
Biological neural networks
About 1011 neurons in human brain
About 1014~15 interconnections
Pulse-transmission frequency million times slower than electronic circuits
Face recognition hundred million second by human Network of artificial neuron operation speed only a few
million second
Applications of ANN
NeuralNetworks
Pattern Recognition
Optimization
AI
Signal Processing
Image Processing
Communication
Control
Prediction Economics
VLSI
Power & Energy
Bioinformatics
Successful apps can be found in well-constrained environmentNone is flexible enough to perform well outside its domain.
Fig.4
Challenging Problems
(1) Pattern classification(2) Clustering/categorization(3) Function approximation(4) Prediction/forecasting(5) Optimization (TSP problem) (6) Retrieval by content(7) control
Fig.5
Brief historical review
Three periods of extensive activity1940s: McCulloch and Pitts’ pioneering work
1960s: Rosenblatt’s perceptron convergence theorem Minsky and Papert’s showing the limitation of a simple
perceptron1980s: Hopfield’s energy approach in 1982 Werbos’ Back-propagation learning algorithm
10
Neuron vs. Artificial Neuron
McCulloch and Pitts propose MP neural model in 1943.
Hebb learning rule. WWWXXW oldnewjiij ,
Fig.6 Fig.7
OutlineIntroduction
Single Layer Perceptron – Perceptron
Example
Single Layer Perceptron – Adaline
Multilayer Perceptron – Back–propagation neural network
Competitive Learning - Example
Radial Basis Function (RBF) Networks
Q&A and Homework
12
Element of Artificial NeuronBais θj
Summation function Transfer
function Output Yj
Weight (Synapse)
x1
Inputs)( jj netfY
j
iiijj xwnet
w2j
w1j
wij
wn-1 j
wn j
……
x2
xi
xn-1
xn
The McCulloch-Pitts model (1949)Fig.8
13
Summation function
An adder for summing the input signal, weighted by the respective synapses of the neuron. Summation
Euclidean Distance
i
iijj XWI
2)( iji
ij WXI
14
Transfer functions
An activation function for limiting the amplitude of the neuron of a neuron.
1. Threshold (step) function
2. Piecewise-Linear function
0,0
0,1
jj
jj
netY
netY
Yj
netj
Yj
netj
Threshold function
Piecewise-Linear function
1
0
-0.5 0.55.0,0
5.05.0,5.0
5.0,1
jj
jjj
jj
netY
netnetY
netY
Transfer functions
3. Sigmoid function
4. Radial Basis Function
)*exp(11
jj netaY
Where a is the slop parameter of the sigmoid function.
netj
Yj
-0.5 0.5
)*exp( jj netaY
Where a is the variance parameter of the radial basis function.
-0.5 0.5
1Yj
netj
Network architectures
Fig.9 A taxonomy of feed-forward and recurrent/feedback network architectures.
Network architectures
Feed-forward networks Static: produce only one set of output value Memory-less: independent of previous state
Recurrent (or feedback) networks Dynamics system
Different architectures require different appropriate learning algorithm
Learning process
The ability to learn is a fundamental trait of intelligent.
Automatically learn from examples.
Instead of following a set of rules specified by human experts.
ANNs appear to learn underlying rules.
This is the major advantages over traditional expert systems.
Learning process
Learning process1. Have a model of the environment
2. Understand how network weights are updated
Three main learning paradigms1. Supervised
2. Unsupervised
3. Hybrid
Learning processThree fundamental and practical issue of Learning theory1. Capacity
1. Patterns2. Functions3. Decision boundaries
2. Sample complexity1. The number of training samples (over-fitting)
3. Computational complexity1. Time required (many learning algorithms have high complexity)
Learning process
Three basic types of learning rules:1. Error-correction rules
2. Hebbian rule If neurons on both sides of a synapse are activated
synchronously and repeatedly, the synapse’s strength is selectively increased.
3. Competitive learning rules
)()()()1( txtytwtw ijijij
Paradigm Learning rule Architecture Learning algorithm TaskSupervised Error-correction Perceptron learning algorithm
Back-propagationAdaline and Madaline
Hebbian Linear discriminant analysis
Competitive Competitive Learning vector quantiztion
ART network ARTMap
Unsupervised Error-correction Multilayer feed-forward Sammon's projection Data analysisHebbian Principal component analysis
Hopfield Network Associative memory learning Associative memoryCompetitive Competitive Vector quantization
Dohonen's SOM Dohonen's SOM
ART networks ART1, ART2 CategorizationHybrid RBF network RBF learning algorithm Pattern classification
Funciton approximationPrediction, control
Pattern classification,Function approximation,Prediction, control
Single- or multilayerperceptron
Multilayer feed-forward Data analysis,Pattern classificationWithin-classcategorization,Data compressionPattern classification,Within-classcategorization
Error-correctionand competitive
Data analysis,Data compression
Feed-forward orcompetitive
Categorization,Data compressionCategorization,Data analysis
Table 2 Well-known learning algorithms.
Error-Correction Rules
The threshold function: if v > 0 , then y = +1 otherwise y = 0
)(1
1
n
jjj
n
jjj
uxwy
uxwv
jjj xydtwtw )()()1(
Σ
x1
x2
xn
…. .
w1
w2
wn Net = Σjxjwj Threshold function
yΘ net
1
0
y
Σ
x1
x2
xn
…. .
w1
w2
wn Net = Σjxjwj Threshold function
yΘ net
1
0
y
Θ net
1
0
y
Fig.10
Learning mode
On-line (Sequential) mode: Update weights for each training data More accurate Require more computational time Faster learning convergence
Off-line (Batch) mode: Update weights after apply all training data Less accurate Require less computational time Require extra storage
Error-Correction Rules
However, a single-layer perceptron can only separate linearly separable patterns as long as a monotonic activation is used.
The back-propagation learning algorithm is based on error-correction principle.
26
Preprocess of Neural networks
Input layers are mapping in [-1,1].
Output layers are mapping in [0,1]
kV
V oldnew
minminmax )( DDDMinMaxMinV
V oldnew
27
Perceptron
In 1957,A single-layer Perceptron network consists of 1 or more artificial neurons in parallel. Each neuron in the single layer provides one network output, and is usually connected to all of the external (or environmental) inputs.Supervised MP neuron model + Hebb learning ……
……
Fig.11
28
Perceptron
Learning Algorithm output
Adjust weight & bias
Energy function
jj
joldnewjj
,jjj
ijij
ijoldij
newij
YT
XW
WWW
,
ijiijj
jj
XWnet
netfY
,
0net if 00net if 1
j
jjY
set trainingin the pairsut vector input/outp ofnumber theis p where
)(1
p
jjj YTE
OutlineIntroduction
Single Layer Perceptron – Perceptron
Example
Single Layer Perceptron – Adaline
Multilayer Perceptron – Back–propagation neural network
Competitive Learning - Example
Radial Basis Function (RBF) Networks
Q&A and Homework
30
Perceptron Example by hand(1/11)
Use two-layer Perceptron to solve AND problem
X1 X2 Y
-1 -1 0
-1 1 0
1 -1 0
1 1 1
Initial parameter
=0.1=0.5W13=1.0W23=-1.0
X3
X1 X2
Fig.12
31
Perceptron Example by hand(2/11)1st learning cycleInput 1st example X1=-1, X2=-1, T=0 net=W13•X1 +W23•X2-=-0.5, Y=0 =T-Y=0 W13=X1=0, W23=0, =-=0
Input 2nd~4th exampleNo X1 X2 T net Y W13 W23
1 -1 -1 0 -0.5 0 0 0 0 0
2 -1 1 0 -2.5 0 0 0 0 0
3 1 -1 0 1.5 1 -1 -0.1 0.1 0.1
4 1 1 1 -0.5 0 1 0.1 0.1 -0.1
0 0.2 0
32
Perceptron Example by hand(3/11)
Adjust weight & bias W13=1, W23=-0.8, =0.5
2nd learning cycle
No X1 X2 T net Y W13 W23
1 -1 -1 0 -0.7 0 0 0 0 0
2 -1 1 0 -2.3 0 0 0 0 0
3 1 -1 0 1.3 1 -1 -0.1 0.1 0.1
4 1 1 1 -0.3 0 1 0.1 0.1 -0.1
0 0.2 0
33
Perceptron Example by hand(4/11)
Adjust weight & bias W13=1, W23=-0.6, =0.5
3rd learning cycle
No X1 X2 T net Y W13 W23
1 -1 -1 0 -0.9 0 0 0 0 0
2 -1 1 0 -2.1 0 0 0 0 0
3 1 -1 0 1.1 1 -1 -0.1 0.1 0.1
4 1 1 1 -0.1 0 1 0.1 0.1 -0.1
0 0.2 0
34
Perceptron Example by hand(5/11)
Adjust weight & bias W13=1, W23=-0.4, =0.5
4th learning cycle
No X1 X2 T net Y W13 W23
1 -1 -1 0 -1.1 0 0 0 0 0
2 -1 1 0 -1.9 0 0 0 0 0
3 1 -1 0 0.9 1 -1 -0.1 0.1 0.1
4 1 1 1 0.1 1 0 0 0 0
-0.1 0.1 0.1
35
Perceptron Example by hand(6/11)
Adjust weight & bias W13=0.9, W23=-0.3, =0.6
5th learning cycle
No X1 X2 T net Y W13 W23
1 -1 -1 0 -1.2 0 0 0 0 0
2 -1 1 0 -1.8 0 0 0 0 0
3 1 -1 0 0.6 1 -1 -0.1 0.1 0.1
4 1 1 1 0 0 1 0.1 0.1 -0.1
0 0.2 0
36
Perceptron Example by hand(7/11)
Adjust weight & bias W13=0.9, W23=-0.1, =0.6
6th learning cycle
No X1 X2 T net Y W13 W23
1 -1 -1 0 -1.4 0 0 0 0 0
2 -1 1 0 -1.6 0 0 0 0 0
3 1 -1 0 0.4 1 -1 -0.1 0.1 0.1
4 1 1 1 0.2 1 0 0 0 0
-0.1 0.1 0.1
37
Perceptron Example by hand(8/11)
Adjust weight & bias W13=0.8, W23=0, =0.7
7th learning cycle
No X1 X2 T net Y W13 W23
1 -1 -1 0 -1.5 0 0 0 0 0
2 -1 1 0 -1.5 0 0 0 0 0
3 1 -1 0 0.1 1 -1 -0.1 0.1 0.1
4 1 1 1 0.1 1 0 0 0 0
-0.1 0.1 0.1
38
No X1 X2 T net Y W13 W23
1 -1 -1 0 -1.6 0 0 0 0 0
2 -1 1 0 -1.4 0 0 0 0 0
3 1 -1 0 -0.2 0 0 0 0 0
4 1 1 1 0 0 1 0.1 0.1 -0.1
0.1 0.1 -0.1
Perceptron Example by hand(9/11)
Adjust weight & bias W13=0.7, W23=0.1, =0.8
8th learning
39
No X1 X2 T net Y W13 W23
1 -1 -1 0 -1.7 0 0 0 0 0
2 -1 1 0 -1.3 0 0 0 0 0
3 1 -1 0 -0.1 0 0 0 0 0
4 1 1 1 0.3 1 0 0 0 0
0 0 0
Perceptron Example by hand(10/11)
Adjust weight & bias W13=0.8, W23=0.2, =0.7
9th learning
40
No X1 X2 T net Y W13 W23
1 -1 -1 0 -1.7 0 0 0 0 0
2 -1 1 0 -1.3 0 0 0 0 0
3 1 -1 0 -0.1 0 0 0 0 0
4 1 1 1 0.3 1 0 0 0 0
0 0 0
Perceptron Example by hand(11/11)
Adjust weight & bias W13=0.8, W23=0.2, =0.7
10th learning (no change, stop learning)
Example
input value desired output value1. x1 = (1, 0, 1)T y1 = -12. x2 = (0,−1,−1)T y2 = 13. x3 = (−1,−0.5,−1)T y3 = 1
the learning constant is assume to be 0.1
The initial weight vector is w0 = (1, -1, 0)T
Fig.13
Step 1: <w0, x1> = (1, -1, 0)*(1, 0, 1)T = 1
Correction is needed since y1 = -1 ≠ sign (1)
w1 = w0 + 0.1*(-1-1)*x1
w1 = (1, -1, 0)T – 0.2*(1, 0, 1)T = (0.8, -1, -0.2)T
Step 2: <w1, x2> = 1.2
y2 = 1 = sign(1.2) w2 = w1
Step 3: <w2, x3> = (0.8, -1, -0.2 )*(−1,−0.5,−1)T = -0.1
Correction is needed since y3 = 1 ≠ sign (-0.1)
w3 = w2 + 0.1*(1-(-1))*x3
w3 = (0.8, -1, -0.2 )T– 0.2*(−1,−0.5,−1)T = (0.6, -1.1, -0.4)T
Step 4: <w3, x1> = (0.6, -1.1, -0.4)*(1, 0, 1)T = 0.2
Correction is needed since y1 = -1 ≠ sign (0.2)
w4 = w3 + 0.1*(-1-1)*x1
w4 = (0.6, -1.1, -0.4)T– 0.2*(1, 0, 1)T = (0.4, -1.1, -0.6)T
Step 5: <w4, x2> = 1.7
y2 = 1 = sign(1.7) w5 = w4
Step 6: <w5, x3> = 0.75
y3 = 1 = sign(0.75) w6 = w5
W6 terminates the learning process.
<w6, x1> = -0.2 < 0
<w6, x2> = 1.7 > 0
<w6, x3> = 0.75 > 0
45
Adaline
Architecture of AdalineApplication
Filter communication
Learning algorithm (Least mean Square, LMS ) Y= purelin(ΣWX-b)=W1X1+W2X2-b
W(t+1)=W(t)+2ηe(t)X(t) b(t+1)=b(t)+2ηe(t) e(t)=T-Y
Y
Input Layer
Weight
Output Layer
X1
X2
W1
W2
b
-1
Fig.14
46
Perceptron in XOR problem
XOR problem
○ ○
○×
1
1
-1
-1
○
×
1
1
-1
-1
○
○×
1
1
-1
-1
×
×
×
OR XORAND
OutlineIntroduction
Single Layer Perceptron – Perceptron
Example
Single Layer Perceptron – Adaline
Multilayer Perceptron – Back–propagation neural network
Competitive Learning - Example
Radial Basis Function (RBF) Networks
Q&A and Homework
Multilayer Feed-Forward Networks
Fig. 15 Network architectures:A taxonomy of feed-forward and recurrent/feedback network architectures.
Multilayer perceptron
Fig. 16 A typical three-layer feed-forward network architecture.
x1
x2
xn
Input layer Hidden layer Output layer
Xq Wqi(1) Wij
(2) Wjk(L)
y1
Yk(L)
y2
yn
Multilayer perceptron
Most popular class Which can form arbitrarily complex decision boundaries
and represent any Boolean function. Back-propagation
Let
Squared-error cost function
A geometric interpretation
2
1
)()(
21
p
i
ii dyE
mini
pp
R ]1,0[
patterns trainingofset a be}),,(),...,,(),,{()()(
)()()2()2()1()1(
dxdxdxdx
A geometric interpretation of the role of hidden unit in a two-
dimensional input spaceFig.17
52
Back-propagation neural network (BPN)
In 1985
Architecture
‧‧
‧‧‧
‧‧‧
Input Vector
Output Vector
Input layer Hidden layer
Output layer
Fig.18
53
BPN Algorithm
Using Gradient Steepest Descent Method to reduce error.
Energy function E = (1/2) (Tj-Yj)2 - =W WE
ηΔ
kjjjkj
j
j
j
jkj
H)(net f)Y--(T=Wnet
netY
YE=
WE
)(net f)Y-(T= jjjj δ
-= E-=
jj
j δηθηΔθ
kkjjjkj
kj HH)(net f)Y-(TWE-= W j
ηηηΔ
Output layer Hidden layer Hidden layer Hidden layer
ikikj
kjjik
ik XX)(net fW WE-= W
ηηηΔ
-= E-=
kj
k δηθηΔθ
OutlineIntroduction
Single Layer Perceptron – Perceptron
Example
Single Layer Perceptron – Adaline
Multilayer Perceptron – Back–propagation neural network
Competitive Learning - Example
Radial Basis Function (RBF) Networks
Q&A and Homework
Competitive Learning Rules
Know as winner-take-all method
It’s an unsupervised learning Often clusters or categorizes the input data
The simplest network Fig.19
Competitive Learning Rules
A geometric interpretation of competitive learning
Fig. 20 (a) Before learning (b) after learning
Example
Examples (Cont’d.)
Examples (Cont’d.)
Fig.21
OutlineIntroduction
Single Layer Perceptron – Perceptron
Example
Single Layer Perceptron – Adaline
Multilayer Perceptron – Back–propagation neural network
Competitive Learning - Example
Radial Basis Function (RBF) Networks
Q&A and Homework
Radial Basis Function networkA special class of feed-forward networks
Origin: Cover’s TheoremRadial basis function (kernel function)
Gaussian function
x1
x2
ψ1
ψ2
)2
exp(),( 2
2
i
ii
cx
cx
K
iiiwF
1
),()( xxx
Fig.22
Radial Basis Function network
There are a variety of learning algorithms for the RBF network Basic one is two-step learning strategy Hybrid learning
Converges much faster than the back-propagation But involves a larger number of hidden units Runtime speed (after training) is slower
The efficiencies of RBF network and multilayer perceptron are problem-dependent.
Issue
How many layers are needed for a given task,
How many units are needed per layer,
Generalization ability
How large the training set should be for ‘good’ generalization.
Although multilayer feed-forward networks has been widely used, but parameters identification still must be determined by trail and error.
64
Journal
Neural networks Neural Networks (The Official Journal of the International Neural
Network Society, INNS) IEEE Transactions on Neural Networks International Journal of Neural Systems International Journal of Neuroncomputing Neural Computation
65
Books
Artificial Intelligence (AI) Artificial Intelligence: A Modern Approach (2nd Edition) , Stuart J. Russell, Peter Norvig
Machine learning Machine Learning , Tom M. Mitchell Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine
Intelligence , Jyh-Shing Roger Jang, Chuen-Tsai Sun, Eiji MizutaniNeural networks
類神經網路模式應用與實作,葉怡成 應用類神經網路,葉怡成 類神經網路 – MATLAB 的應用,羅華強 Neural Networks: A Comprehensive Foundation (2nd Edition) , Simon Haykin Neural Network Design , Martin T. Hagan, Howard B. Demuth, Mark H. Beale
Genetic Algorithm Genetic Algorithms in Search, Optimization, and Machine Learning , David E. Goldberg Genetic Algorithms + Data Structures = Evolution Programs , Zbigniew Michalewicz An Introduction to Genetic Algorithms for Scientists and Engineers , David A. Coley
Home work
1. Use two-layer Perceptron to solve OR problem.1. Draw the topology (structure) of the neural network, including the number of nodes
in each layer and the associated weight linkage.
2. Please discuss how initial parameters(weights, bias, learning rate) affect the learning process.
3. Please discuss the difference between batch mode learning and on-line learning.
2. Use two-layer Perceptron to solve XOR problem.1. Please discuss why it cannot solve XOR problem.
67
Thanks