introduction to artificial neural networks

Introduction to ArtificialNeural Networks

主講人 : 虞台文

Content Fundamental Concepts of ANNs. Basic Models and Learning Rules

– Neuron Models– ANN structures– Learning

Distributed Representations Conclusions


Fundamental Concepts of ANNs

What is ANN? Why ANN?

ANN Artificial Neural Networks– To simulate human brain behavior– A new generation of information

processing system.

Applications

Pattern Matching Pattern Recognition Associate Memory (Content Addressable

Memory) Function Approximation Learning Optimization Vector Quantization Data Clustering . . .

Applications

Pattern Matching Pattern Recognition Associate Memory (Content Addressable

Memory) Function Approximation Learning Optimization Vector Quantization Data Clustering . . .

Traditional Computers are inefficient at these tasks although their computation speed is faster.

Traditional Computers are inefficient at these tasks although their computation speed is faster.

The Configuration of ANNs

An ANN consists of a large number of interconnected processing elements called neurons.– A human brain consists of ~1011 neurons of

many different types.

How ANN works?– Collective behavior.

The Biologic Neuron

The Biologic Neuron

樹狀突樹狀突

軸突軸突二神經原之神經絲接合部分二神經原之神經絲接合部分

The Biologic Neuron

Excitatory or Inhibitory

The Artificial Neuron

x1

x2

xm

wi1

wi2

wim

yi

f (.) a (.)

i


x1

x2

xm

wi1

wi2

wim

yi

f (.) a (.)

i

ij

m

jiji xwf

1

)( ij

m

jiji xwf

1

)(

)()1( fatyi )()1( fatyi

otherwise

ffa

0

01)(

otherwise

ffa

0

01)(


x1

x2

xm

wi1

wi2

wim

yi

f (.) a (.)

i

wij

positive excitatorynegative inhibitoryzero no connection

wij

positive excitatorynegative inhibitoryzero no connection


x1

x2

xm

wi1

wi2

wim

yi

f (.) a (.)

i

Proposed by McCulloch and Pitts [1943]M-P neurons

Proposed by McCulloch and Pitts [1943]M-P neurons

What can be done by M-P neurons?

A hard limiter. A binary threshold unit. Hyperspace separation.

otherwise

fy

xwxwf

i

i

0

0)(1

)( 2211

otherwise

fy

xwxwf

i

i

0

0)(1

)( 2211

w 1 x 1 + w 2 x 2 =

x1

x2

x1 x2

y

w1 w2 10

What ANNs will be?

ANN A neurally inspired mathematical model. Consists a large number of highly interconnected PE

s. Its connections (weights) holds knowledge. The response of PE depends only on local informatio

n. Its collective behavior demonstrates the computatio

n power. With learning, recalling and, generalization capabilit

y.

Three Basic Entities of ANN Models

Models of Neurons or PEs.Models of synaptic interconnections and stru

ctures. Training or learning rules.


Basic Models and Learning Rules Neuron Models ANN structures Learning

Processing Elements

f (.) a (.)

i

What integration functions we may have?

What integration functions we may have?

What activation functions we may have?

What activation functions we may have?

Extensions of M-P neurons

Integration Functions

f (.) a (.)

iij

m

jiji xwf

2

1

ij

m

jiji xwf

2

1

Quadratic Function

i

m

jijji wxf

1

2)( i

m

jijji wxf

1

2)(Spherical Function

1 1

j k

m m

i ijk j k j k ij k

f w x x x x

1 1

j k

m m

i ijk j k j k ij k

f w x x x x

Polynomial Function

ij

m

jijii xwnetf

1

ij

m

jijii xwnetf

1M-P neuron

Activation Functions

f (.) a (.)

i

M-P neuron: (Step function)

otherwise

ffa

0

01)(

otherwise

ffa

0

01)(

1

a

f


f (.) a (.)

i

Hard Limiter (Threshold function)

01

01)sgn()(

f

fffa

01

01)sgn()(

f

fffa

1

a

1 f


f (.) a (.)

i

Ramp function:

00

10

11

)(

f

ff

f

fa

00

10

11

)(

f

ff

f

fa

1

a

1 f


f (.) a (.)

i

Unipolar sigmoid function:

0

0.5

1

1.5

-4 -3 -2 -1 0 1 2 3 4

fefa

1

1)( fe

fa

1

1)(


f (.) a (.)

i

Bipolar sigmoid function:

11

2)(

fefa

11

2)(

fefa

-1.5

-1

-0.5

0

0.5

1

1.5

-4 -3 -2 -1 0 1 2 3 4

x

y

Example: Activation Surfaces

L1

L2

L3

x y

L1 L2 L3

x

y L1

L2

L3


x1=0

y1=0

xy+4=0

x y

L1 L2 L3

10

1=1

0 1

2=1

11

3= 4


x y

L1 L2 L3

x

y L1

L2

L3

011

001 101 100

110

010

111

Region Code

x

y L1

L2

L3


z=1

z=0 L4

z

x y

L1 L2 L3

x

y L1

L2

L3


z=1

z=0 L4

z

x y

L1 L2 L3

1

4=2.5

1 1


L4

z

x y

L1 L2 L3

M-P neuron: (Step function)

otherwise

ffa

0

01)(

otherwise

ffa

0

01)(


L4

z

x y

L1 L2 L3

=2 =3

=5 =10

Unipolar sigmoid function: fe

fa

1

1)( fe

fa

1

1)(

ANN Structure (Connections)

Single-Layer Feedforward Networks

y1 y2 yn

x1 x2 xm

w11 w12

w1mw21 w22

w2m wn1 wnmwn2

. . .

Multilayer Feedforward Networks

. . .

. . .

. . .

. . .

x1 x2 xm

y1 y2 yn

Hidden Layer

Input Layer

Output Layer

Multilayer Feedforward Networks

Pattern Recognition

Input

Analysis

ClassificationOutput

Learning

Where the knowledge

from?

Single Node with Feedback to Itself

FeedbackLoop

Single-Layer Recurrent Networks

. . .

x1 x2 xm

y1 y2 yn

Multilayer Recurrent Networks

x1 x2 x3

y1 y2 y3

. . .

. . .

Learning

Consider an ANN with n neurons and each with m adaptive weights.

Weight matrix:

nmnn

m

m

Tn

T

T

www

www

www

21

22221

11211

2

1

w

w

w

W

Learning

Consider an ANN with n neurons and each with m adaptive weights.

Weight matrix:

nmnn

m

m

Tn

T

T

www

www

www

21

22221

11211

2

1

w

w

w

W

To “Learn” the weight matrix.To “Learn” the weight matrix.

How?

Learning Rules

Supervised learning

Reinforcement learning

Unsupervised learning

Supervised Learning

Learning with a teacher

Learning by examples Training set

(1) (2)(1) (2 )) ( )(( , ), ( , ), , ( , ),kk d d dx xT x

Supervised Learning

x

Errorsignal

Generator

Errorsignal

Generator

d

yANN

W

(1) (2)(1) (2 )) ( )(( , ), ( , ), , ( , ),kk d d dx xT x

Reinforcement Learning

Learning with a criticLearning by comments

Reinforcement Learning

x

Criticsignal

Generator

Criticsignal

Generator

yANN

WReinforcement

Signal

Unsupervised Learning

Self-organizingClustering

– Form proper clusters by discovering the similarities and dissimilarities among objects.

Unsupervised Learning

x yANN

W

The General Weight Learning Rule

1

1

m

i ijijj

net xw

Input:

Output: ( )i iy a net

i

.

.

.

.

.

.

wi1

wi2

wij

wi,m-1

x1

x2

xj

xm-1

yi

i


1

1

m

i ijijj

net xw

Input:

Output: ( )i iy a net

i

.

.

.

.

.

.

wi1

wi2

wij

wi,m-1

x1

x2

xj

xm-1

yi

i

We want to learn the weights & bias.

We want to learn the weights & bias.


1

1

m

i ijijj

net xw

Input:

i

.

.

.

.

.

.

wi1

wi2

wij

wi,m-1

x1

x2

xj

xm-1

i

1ij

m

i jj

net xw

Let xm = 1 and wim = i.


1

1

m

i ijijj

net xw

Input:

i

.

.

.

.

.

.

wi1

wi2

wij

wi,m-1

x1

x2

xj

xm-1

1ij

m

i jj

net xw

Let xm = 1 and wim = i.

xm= 1wim=i


Input:

i

.

.

.

.

.

.

wi1

wi2

wij

wi,m-1

x1

x2

xj

xm-1

1ij

m

i jj

net xw

xm= 1wim=i

yi

wi=(wi1, wi2 ,…,wim)Twi=(wi1, wi2 ,…,wim)T

wi(t) = ?wi(t) = ?

We wantto learn


wiwix yi

r diLearningSignal

Generator

LearningSignal

Generator


wiwix yi

r diLearningSignal

Generator

LearningSignal

Generator

( , , )r i if dw x


wiwix yi

r diLearningSignal

Generator

LearningSignal

Generator

)()( trti xw )()( trti xw

( , , )r i if dw x


wiwix yi

r diLearningSignal

Generator

LearningSignal

Generator

( ) ( )i t tr w x( ) ( )i t tr w x

)()( trti xw )()( trti xw

Learning Rate

( , , )r i if dw x


wi=(wi1, wi2 ,…,wim)Twi=(wi1, wi2 ,…,wim)TWe wantto learn

( , , )r i ir f d w x( , , )r i ir f d w x( ) ( )i t tr w x( ) ( )i t tr w x

( 1) ( ) ( ) ( ) ( ) ( )( , , )t t t t t ti i r i if d w w w x x

( 1) ( ) ( ) ( ) ( ) ( )( , , )t t t t t ti i r i if d w w w x x

Discrete-Time Weight Modification Rule:

Continuous-Time Weight Modification Rule:

( )( )id t

r tdt

wx

( )( )id t

r tdt

wx

Hebb’s Learning Law

• Hebb [1994] hypothesis that when an axonal input from A to B causes neuron B to immediately emit a pulse (fire) and this situation happens repeatedly or persistently.

• Then, the efficacy of that axonal input, in terms of ability to help neuron B to fire in future, is somehow increased.

• Hebb’s learning rule is a unsupervised learning rule.

Hebb’s Learning Law

( , , ) ( )Tr i i iif d ar y w x w x( , , ) ( )Tr i i iif d ar y w x w x

( ) ( ) ii rt t y w x x( ) ( ) ii rt t y w x x

iij jw xy iij jw xy

+

+


Distributed Representations

Distributed Representations

• Distributed Representation:– An entity is represented by a pattern of a

ctivity distributed over many PEs.– Each Processing element is involved in r

epresenting many different entities.

• Local Representation:– Each entity is represented by one PE.

Example

P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15

+ _ + + _ _ _ _ + + + + + _ _ _

+ _ + + _ _ _ _ + _ + _ + + _ +

+ + _ + _ + + _ + _ _ + + + + _

Dog

Cat

Bread


+ _ + + _ _ _ _ + + + + + _ _ _

+ _ + + _ _ _ _ + _ + _ + + _ +

+ + _ + _ + + _ + _ _ + + + + _

Dog

Cat

Bread

Advantages

Act as a content addressable memory.


+ + + +

What is this?What is this?

Advantages


+ _ + + _ _ _ _ + + + + + _ _ _

+ _ + + _ _ _ _ + _ + _ + + _ +

+ + _ + _ + + _ + _ _ + + + + _

Dog

Cat

Bread

Act as a content addressable memory.

Make induction easy.


+ _ _ + _ _ _ _ + + + + + + _ _Fido

Dog has 4 legs? How many for Fido?Dog has 4 legs? How many for Fido?

Advantages


+ _ + + _ _ _ _ + + + + + _ _ _

+ _ + + _ _ _ _ + _ + _ + + _ +

+ + _ + _ + + _ + _ _ + + + + _

Dog

Cat

Bread

Act as a content addressable memory. Make induction easy. Make the creation of new entities or

concept easy (without allocation of new hardware).

+ + _ _ _ + + _ + _ _ _ + + + _Doughnut

Add doughnut by changing weights.Add doughnut by changing weights.

Advantages


+ _ + + _ _ _ _ + + + + + _ _ _

+ _ + + _ _ _ _ + _ + _ + + _ +

+ + _ + _ + + _ + _ _ + + + + _

Dog

Cat

Bread

Act as a content addressable memory. Make induction easy. Make the creation of new entities or concept

easy (without allocation of new hardware). Fault Tolerance.

Some PEs break down don’t cause problem.Some PEs break down don’t cause problem.

Disadvantages

• How to understand?• How to modify?

Learning procedures are required.Learning procedures are required.

introduction to artificial neural networks

Documents