ismp lab 新生訓練課程 artificial neural networks 類神經網路

ISMP Lab 新生訓練課程 Artificial Neural Networks 類神經網路

指導教授：郭耀煌教授碩士班學生 :黃盛裕 96級

2008/7/18

National Cheng Kung University/Walsin Lihwa Corp.「 Center for Research of E-life DIgital Technology」成功大學 / 華新麗華「數位生活科技研究中心」

OutlineIntroduction

Single Layer Perceptron – Perceptron

Example

Single Layer Perceptron – Adaline

Multilayer Perceptron – Back–propagation neural network

Competitive Learning - Example

Radial Basis Function (RBF) Networks

Q&A and Homework

3

Artificial Neural Networks (ANN)

Artificial Neural Networks simulate human brain approximate any nonlinear and complex functions

accuracy

Fig.1Fig.2

4

Neural Networks vs. Computer

processing elements

element size

energy use

processing speed

style of computation

fault tolerant learns

intelligent,

conscious

1014 synapses 10-6 m 30 W 100 Hz parallel, distributed yes yes usually

108 transistors 10-6 m 30 W

(CPU) 109 Hz serial, centralized no a little not (yet)

Table 1

Biological neural networksFig.3

Biological neural networks

About 1011 neurons in human brain

About 1014~15 interconnections

Pulse-transmission frequency million times slower than electronic circuits

Face recognition hundred million second by human Network of artificial neuron operation speed only a few

million second

Applications of ANN

NeuralNetworks

Pattern Recognition

Optimization

AI

Signal Processing

Image Processing

Communication

Control

Prediction Economics

VLSI

Power & Energy

Bioinformatics

Successful apps can be found in well-constrained environmentNone is flexible enough to perform well outside its domain.

Fig.4

Challenging Problems

(1) Pattern classification(2) Clustering/categorization(3) Function approximation(4) Prediction/forecasting(5) Optimization (TSP problem) (6) Retrieval by content(7) control

Fig.5

Brief historical review

Three periods of extensive activity1940s: McCulloch and Pitts’ pioneering work

1960s: Rosenblatt’s perceptron convergence theorem Minsky and Papert’s showing the limitation of a simple

perceptron1980s: Hopfield’s energy approach in 1982 Werbos’ Back-propagation learning algorithm

10

Neuron vs. Artificial Neuron

McCulloch and Pitts propose MP neural model in 1943.

Hebb learning rule. WWWXXW oldnewjiij ,

Fig.6 Fig.7

OutlineIntroduction


Example





Q&A and Homework

12

Element of Artificial NeuronBais θj

Summation function Transfer

function Output Yj

Weight (Synapse)

x1

Inputs)( jj netfY

j

iiijj xwnet

w2j

w1j

wij

wn-1 j

wn j

……

x2

xi

xn-1

xn

The McCulloch-Pitts model (1949)Fig.8

13

Summation function

An adder for summing the input signal, weighted by the respective synapses of the neuron. Summation

Euclidean Distance

i

iijj XWI

2)( iji

ij WXI

14

Transfer functions

An activation function for limiting the amplitude of the neuron of a neuron.

1. Threshold (step) function

2. Piecewise-Linear function

0,0

0,1

jj

jj

netY

netY

Yj

netj

Yj

netj

Threshold function

Piecewise-Linear function

1

0

-0.5 0.55.0,0

5.05.0,5.0

5.0,1

jj

jjj

jj

netY

netnetY

netY

Transfer functions

3. Sigmoid function

4. Radial Basis Function

)*exp(11

jj netaY

Where a is the slop parameter of the sigmoid function.

netj

Yj

-0.5 0.5

)*exp( jj netaY

Where a is the variance parameter of the radial basis function.

-0.5 0.5

1Yj

netj

Network architectures

Fig.9 A taxonomy of feed-forward and recurrent/feedback network architectures.

Network architectures

Feed-forward networks Static: produce only one set of output value Memory-less: independent of previous state

Recurrent (or feedback) networks Dynamics system

Different architectures require different appropriate learning algorithm

Learning process

The ability to learn is a fundamental trait of intelligent.

Automatically learn from examples.

Instead of following a set of rules specified by human experts.

ANNs appear to learn underlying rules.

This is the major advantages over traditional expert systems.

Learning process

Learning process1. Have a model of the environment

2. Understand how network weights are updated

Three main learning paradigms1. Supervised

2. Unsupervised

3. Hybrid

Learning processThree fundamental and practical issue of Learning theory1. Capacity

1. Patterns2. Functions3. Decision boundaries

2. Sample complexity1. The number of training samples (over-fitting)

3. Computational complexity1. Time required (many learning algorithms have high complexity)

Learning process

Three basic types of learning rules:1. Error-correction rules

2. Hebbian rule If neurons on both sides of a synapse are activated

synchronously and repeatedly, the synapse’s strength is selectively increased.

3. Competitive learning rules

)()()()1( txtytwtw ijijij

Paradigm Learning rule Architecture Learning algorithm TaskSupervised Error-correction Perceptron learning algorithm

Back-propagationAdaline and Madaline

Hebbian Linear discriminant analysis

Competitive Competitive Learning vector quantiztion

ART network ARTMap

Unsupervised Error-correction Multilayer feed-forward Sammon's projection Data analysisHebbian Principal component analysis

Hopfield Network Associative memory learning Associative memoryCompetitive Competitive Vector quantization

Dohonen's SOM Dohonen's SOM

ART networks ART1, ART2 CategorizationHybrid RBF network RBF learning algorithm Pattern classification

Funciton approximationPrediction, control

Pattern classification,Function approximation,Prediction, control

Single- or multilayerperceptron

Multilayer feed-forward Data analysis,Pattern classificationWithin-classcategorization,Data compressionPattern classification,Within-classcategorization

Error-correctionand competitive

Data analysis,Data compression

Feed-forward orcompetitive

Categorization,Data compressionCategorization,Data analysis

Table 2 Well-known learning algorithms.

Error-Correction Rules

The threshold function: if v > 0 , then y = +1 otherwise y = 0

)(1

1

n

jjj

n

jjj

uxwy

uxwv

jjj xydtwtw )()()1(

Σ

x1

x2

xn

…. .

w1

w2

wn Net = Σjxjwj Threshold function

yΘ net

1

0

y

Σ

x1

x2

xn

…. .

w1

w2

wn Net = Σjxjwj Threshold function

yΘ net

1

0

y

Θ net

1

0

y

Fig.10

Learning mode

On-line (Sequential) mode: Update weights for each training data More accurate Require more computational time Faster learning convergence

Off-line (Batch) mode: Update weights after apply all training data Less accurate Require less computational time Require extra storage

Error-Correction Rules

However, a single-layer perceptron can only separate linearly separable patterns as long as a monotonic activation is used.

The back-propagation learning algorithm is based on error-correction principle.

26

Preprocess of Neural networks

Input layers are mapping in [-1,1].

Output layers are mapping in [0,1]

kV

V oldnew

minminmax )( DDDMinMaxMinV

V oldnew

27

Perceptron

In 1957,A single-layer Perceptron network consists of 1 or more artificial neurons in parallel. Each neuron in the single layer provides one network output, and is usually connected to all of the external (or environmental) inputs.Supervised MP neuron model + Hebb learning ……

……

Fig.11

28

Perceptron

Learning Algorithm output

Adjust weight & bias

Energy function

jj

joldnewjj

,jjj

ijij

ijoldij

newij

YT

XW

WWW

,

ijiijj

jj

XWnet

netfY

,

0net if 00net if 1

j

jjY

set trainingin the pairsut vector input/outp ofnumber theis p where

)(1

p

jjj YTE

OutlineIntroduction


Example





Q&A and Homework

30

Perceptron Example by hand(1/11)

Use two-layer Perceptron to solve AND problem

X1 X2 Y

-1 -1 0

-1 1 0

1 -1 0

1 1 1

Initial parameter

=0.1=0.5W13=1.0W23=-1.0

X3

X1 X2

Fig.12

31

Perceptron Example by hand(2/11)1st learning cycleInput 1st example X1=-1, X2=-1, T=0 net=W13•X1 +W23•X2-=-0.5, Y=0 =T-Y=0 W13=X1=0, W23=0, =-=0

Input 2nd~4th exampleNo X1 X2 T net Y W13 W23

1 -1 -1 0 -0.5 0 0 0 0 0

2 -1 1 0 -2.5 0 0 0 0 0

3 1 -1 0 1.5 1 -1 -0.1 0.1 0.1

4 1 1 1 -0.5 0 1 0.1 0.1 -0.1

0 0.2 0

32


Adjust weight & bias W13=1, W23=-0.8, =0.5

2nd learning cycle

No X1 X2 T net Y W13 W23

1 -1 -1 0 -0.7 0 0 0 0 0

2 -1 1 0 -2.3 0 0 0 0 0

3 1 -1 0 1.3 1 -1 -0.1 0.1 0.1

4 1 1 1 -0.3 0 1 0.1 0.1 -0.1

0 0.2 0

33



3rd learning cycle


1 -1 -1 0 -0.9 0 0 0 0 0

2 -1 1 0 -2.1 0 0 0 0 0

3 1 -1 0 1.1 1 -1 -0.1 0.1 0.1

4 1 1 1 -0.1 0 1 0.1 0.1 -0.1

0 0.2 0

34



4th learning cycle


1 -1 -1 0 -1.1 0 0 0 0 0

2 -1 1 0 -1.9 0 0 0 0 0

3 1 -1 0 0.9 1 -1 -0.1 0.1 0.1

4 1 1 1 0.1 1 0 0 0 0

-0.1 0.1 0.1

35


Adjust weight & bias W13=0.9, W23=-0.3, =0.6

5th learning cycle


1 -1 -1 0 -1.2 0 0 0 0 0

2 -1 1 0 -1.8 0 0 0 0 0

3 1 -1 0 0.6 1 -1 -0.1 0.1 0.1

4 1 1 1 0 0 1 0.1 0.1 -0.1

0 0.2 0

36


Adjust weight & bias W13=0.9, W23=-0.1, =0.6

6th learning cycle


1 -1 -1 0 -1.4 0 0 0 0 0

2 -1 1 0 -1.6 0 0 0 0 0

3 1 -1 0 0.4 1 -1 -0.1 0.1 0.1

4 1 1 1 0.2 1 0 0 0 0

-0.1 0.1 0.1

37


Adjust weight & bias W13=0.8, W23=0, =0.7

7th learning cycle


1 -1 -1 0 -1.5 0 0 0 0 0

2 -1 1 0 -1.5 0 0 0 0 0

3 1 -1 0 0.1 1 -1 -0.1 0.1 0.1

4 1 1 1 0.1 1 0 0 0 0

-0.1 0.1 0.1

38


1 -1 -1 0 -1.6 0 0 0 0 0

2 -1 1 0 -1.4 0 0 0 0 0

3 1 -1 0 -0.2 0 0 0 0 0

4 1 1 1 0 0 1 0.1 0.1 -0.1

0.1 0.1 -0.1


Adjust weight & bias W13=0.7, W23=0.1, =0.8

8th learning

39


1 -1 -1 0 -1.7 0 0 0 0 0

2 -1 1 0 -1.3 0 0 0 0 0

3 1 -1 0 -0.1 0 0 0 0 0

4 1 1 1 0.3 1 0 0 0 0

0 0 0



9th learning

40


1 -1 -1 0 -1.7 0 0 0 0 0

2 -1 1 0 -1.3 0 0 0 0 0

3 1 -1 0 -0.1 0 0 0 0 0

4 1 1 1 0.3 1 0 0 0 0

0 0 0



10th learning (no change, stop learning)

Example

input value desired output value1. x1 = (1, 0, 1)T y1 = -12. x2 = (0,−1,−1)T y2 = 13. x3 = (−1,−0.5,−1)T y3 = 1

the learning constant is assume to be 0.1

The initial weight vector is w0 = (1, -1, 0)T

Fig.13

Step 1: <w0, x1> = (1, -1, 0)*(1, 0, 1)T = 1

Correction is needed since y1 = -1 ≠ sign (1)

w1 = w0 + 0.1*(-1-1)*x1

w1 = (1, -1, 0)T – 0.2*(1, 0, 1)T = (0.8, -1, -0.2)T

Step 2: <w1, x2> = 1.2

y2 = 1 = sign(1.2) w2 = w1

Step 3: <w2, x3> = (0.8, -1, -0.2 )*(−1,−0.5,−1)T = -0.1

Correction is needed since y3 = 1 ≠ sign (-0.1)

w3 = w2 + 0.1*(1-(-1))*x3

w3 = (0.8, -1, -0.2 )T– 0.2*(−1,−0.5,−1)T = (0.6, -1.1, -0.4)T

Step 4: <w3, x1> = (0.6, -1.1, -0.4)*(1, 0, 1)T = 0.2

Correction is needed since y1 = -1 ≠ sign (0.2)

w4 = w3 + 0.1*(-1-1)*x1

w4 = (0.6, -1.1, -0.4)T– 0.2*(1, 0, 1)T = (0.4, -1.1, -0.6)T

Step 5: <w4, x2> = 1.7

y2 = 1 = sign(1.7) w5 = w4

Step 6: <w5, x3> = 0.75

y3 = 1 = sign(0.75) w6 = w5

W6 terminates the learning process.

<w6, x1> = -0.2 < 0

<w6, x2> = 1.7 > 0

<w6, x3> = 0.75 > 0

45

Adaline

Architecture of AdalineApplication

Filter communication

Learning algorithm (Least mean Square， LMS ) Y= purelin(ΣWX-b)=W1X1+W2X2-b

W(t+1)=W(t)+2ηe(t)X(t) b(t+1)=b(t)+2ηe(t) e(t)=T-Y

Y

Input Layer

Weight

Output Layer

X1

X2

W1

W2

b

-1

Fig.14

46

Perceptron in XOR problem

XOR problem

○ ○

○×

1

1

-1

-1

○

×

1

1

-1

-1

○

○×

1

1

-1

-1

×

×

×

OR XORAND

OutlineIntroduction


Example





Q&A and Homework

Multilayer Feed-Forward Networks

Fig. 15 Network architectures:A taxonomy of feed-forward and recurrent/feedback network architectures.

Multilayer perceptron

Fig. 16 A typical three-layer feed-forward network architecture.

x1

x2

xn

Input layer Hidden layer Output layer

Xq Wqi(1) Wij

(2) Wjk(L)

y1

Yk(L)

y2

yn

Multilayer perceptron

Most popular class Which can form arbitrarily complex decision boundaries

and represent any Boolean function. Back-propagation

Let

Squared-error cost function

A geometric interpretation

2

1

)()(

21

p

i

ii dyE

mini

pp

R ]1,0[

patterns trainingofset a be}),,(),...,,(),,{()()(

)()()2()2()1()1(

dxdxdxdx

A geometric interpretation of the role of hidden unit in a two-

dimensional input spaceFig.17

52

Back-propagation neural network (BPN)

In 1985

Architecture

‧‧

‧‧‧

‧‧‧

Input Vector

Output Vector

Input layer Hidden layer

Output layer

Fig.18

53

BPN Algorithm

Using Gradient Steepest Descent Method to reduce error.

Energy function E = (1/2) (Tj-Yj)2 - =W WE

ηΔ

kjjjkj

j

j

j

jkj

H)(net f)Y--(T=Wnet

netY

YE=

WE

)(net f)Y-(T= jjjj δ

-= E-=

jj

j δηθηΔθ

kkjjjkj

kj HH)(net f)Y-(TWE-= W j

ηηηΔ

Output layer Hidden layer Hidden layer Hidden layer

ikikj

kjjik

ik XX)(net fW WE-= W

ηηηΔ

-= E-=

kj

k δηθηΔθ

OutlineIntroduction


Example





Q&A and Homework

Competitive Learning Rules

Know as winner-take-all method

It’s an unsupervised learning Often clusters or categorizes the input data

The simplest network Fig.19

Competitive Learning Rules

A geometric interpretation of competitive learning

Fig. 20 (a) Before learning (b) after learning

Example

Examples (Cont’d.)

Examples (Cont’d.)

Fig.21

OutlineIntroduction


Example





Q&A and Homework

Radial Basis Function networkA special class of feed-forward networks

Origin: Cover’s TheoremRadial basis function (kernel function)

Gaussian function

x1

x2

ψ1

ψ2

)2

exp(),( 2

2

i

ii

cx

cx

K

iiiwF

1

),()( xxx

Fig.22

Radial Basis Function network

There are a variety of learning algorithms for the RBF network Basic one is two-step learning strategy Hybrid learning

Converges much faster than the back-propagation But involves a larger number of hidden units Runtime speed (after training) is slower

The efficiencies of RBF network and multilayer perceptron are problem-dependent.

Issue

How many layers are needed for a given task,

How many units are needed per layer,

Generalization ability

How large the training set should be for ‘good’ generalization.

Although multilayer feed-forward networks has been widely used, but parameters identification still must be determined by trail and error.

64

Journal

Neural networks Neural Networks (The Official Journal of the International Neural

Network Society, INNS) IEEE Transactions on Neural Networks International Journal of Neural Systems International Journal of Neuroncomputing Neural Computation

65

Books

Artificial Intelligence (AI) Artificial Intelligence: A Modern Approach (2nd Edition) ， Stuart J. Russell, Peter Norvig

Machine learning Machine Learning ， Tom M. Mitchell Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine

Intelligence ， Jyh-Shing Roger Jang, Chuen-Tsai Sun, Eiji MizutaniNeural networks

類神經網路模式應用與實作，葉怡成應用類神經網路，葉怡成類神經網路 – MATLAB 的應用，羅華強 Neural Networks: A Comprehensive Foundation (2nd Edition) ， Simon Haykin Neural Network Design ， Martin T. Hagan, Howard B. Demuth, Mark H. Beale

Genetic Algorithm Genetic Algorithms in Search, Optimization, and Machine Learning ， David E. Goldberg Genetic Algorithms + Data Structures = Evolution Programs ， Zbigniew Michalewicz An Introduction to Genetic Algorithms for Scientists and Engineers ， David A. Coley

Home work

1. Use two-layer Perceptron to solve OR problem.1. Draw the topology (structure) of the neural network, including the number of nodes

in each layer and the associated weight linkage.

2. Please discuss how initial parameters(weights, bias, learning rate) affect the learning process.

3. Please discuss the difference between batch mode learning and on-line learning.

2. Use two-layer Perceptron to solve XOR problem.1. Please discuss why it cannot solve XOR problem.

67

Thanks

ismp lab 新生訓練課程 artificial neural networks 類神經網路

Documents