march, 2009 ® copyright of shun-feng su 1 the essence of computational intelligence...

102
March, 2009 ®Copyright of Shun-Feng Su 1 The Essence of Computational Intelligence 計計計計計計計計計計 Offered by 蘇蘇蘇 Shun-Feng Su, E-mail: [email protected] Department of Electrical Engineering, National Taiwan University of Science and Technology

Post on 20-Dec-2015

225 views

Category:

Documents


1 download

TRANSCRIPT

March, 2009

®Copyright of Shun-Feng Su

1

The Essence of Computational Intelligence

計算型智慧的基本概念 

Offered by 蘇順豐Shun-Feng Su,

E-mail: [email protected]

Department of Electrical Engineering,National Taiwan University of Science and Technology

 

March, 2009

®Copyright of Shun-Feng Su

2

Preface

People always dreams of having machines that can act like human.

Artificial Intelligence is to study what are those components that can facilitate such a dream.

Due to the nature of knowledge, traditional artificial intelligence use symbols to construct the conceptual world.

March, 2009

®Copyright of Shun-Feng Su

3

Preface

Symbolic artificial intelligence is very difficult to manipulate for a real world problem, especially, for implementing common sense knowledge.

Recently, computational intelligence (CI) is commonly used and has demonstrated good performance in various applications.

CI is named to distinguish itself from the traditional symbolic artificial intelligence in the property of easy manipulation with the use of numerical knowledge representation.

March, 2009

®Copyright of Shun-Feng Su

4

Preface

The following three methodologies are often considered as CI:

 Fuzzy Systems,  Neural Networks, and Genetic Algorithms (or referred to as

Evolutionary Computation. )

This talk is to provide fundamental concepts and ideas in those often mentioned techniques.

March, 2009

®Copyright of Shun-Feng Su

5

Basics for CI

CI is known to have the following characteristics [1]:

Numerical knowledge representation; Adaptability; Fault tolerance; Fast processing speed ; Error rate optimality.

[1] J. C. Bezdek, “what is computational intelligence?” Computational Intelligence: Imitating Life, J. M. Zurada, R. J. Marks II, and C. J. Robinson, Eds., New York: IEEE Press, pp. 1-12, 1994.

March, 2009

®Copyright of Shun-Feng Su

6

Basics for CI

Possible advantages of using CI are:

Efficiency; Robustness; Good generalization capability; Easy to use; Easy to incorporate problem domain heuristics; Superior performance in various applications.

March, 2009

®Copyright of Shun-Feng Su

7

Basics for CI

Possible advantages of using CI are:

Efficiency; Robustness; Good generalization capability; Easy to use; Easy to incorporate problem domain heuristics; Superior performance in various applications.

Generation capability is to have a fair chance to behave as required

for any input data.

March, 2009

®Copyright of Shun-Feng Su

8

Basics for CI

Possible problems encountered while using CI are:

Incomprehensive in knowledge;Lack of theoretical analysis tools, such as

stability, performance guarantee, etc.; Various subjective parameters required; Lack of benchmarks in performance evaluation.

May be disadvantages, but sometimes, may provide good means for applications.

March, 2009

®Copyright of Shun-Feng Su

9

Introductions

Fuzzy Systems Uncertainty and Its representation Fuzzy operations and Uncertainty Reasoning Fuzzy Logic Control

Neural Networks

Genetic Algorithms

Epilogue

Outline

March, 2009

®Copyright of Shun-Feng Su

10

Introduction of Fuzzy Systems

Fuzzy systems have been widely used in various applications.

In fact, the fundamental idea behind fuzzy systems is to include uncertainty in the process.

Such an inclusion provides extra information so that the systems can be more accurate.

In other words, fuzzy is vagueness by meaning, but can provides accurate due to this extra information.

March, 2009

®Copyright of Shun-Feng Su

11

Uncertainties in Intelligent systems

Uncertainties exists for the following reasons:

noise always exists in the environment;facts being true or events occurring may not be

certain;stored knowledge is incomplete or liable to change;exceptions are inevitable for any realistic

knowledge;simplifications are necessary to reduce the

complexity of the system;partitions of continuous variables for rule-based

knowledge results in fuzzy set concept.

March, 2009

®Copyright of Shun-Feng Su

12

Uncertainties in Intelligent systems

Traditional systems always use nominal values to reason and to make decision. But, to use more information may have more accurate decision making.

Thus, to act intelligently, those uncertainties cannot be ignored in the way of computing.

To incorporate uncertainties in the decision making process, the system must be capable of representing uncertainty and also be equipped with the capability of approximate reasoning.

March, 2009

®Copyright of Shun-Feng Su

13

Fuzzy Sets As A Representation for Uncertainty

The traditional sets are called classical sets or crisp sets.

In a crisp set, the membership belonging is crisp and can be described in a simple yes/no answer. That is, an element is either in the set or not in the set. The membership function of A is defined as

.,0

,,1)(

Axwhen

AxwhenxA

March, 2009

®Copyright of Shun-Feng Su

14

Fuzzy Sets As A Representation for Uncertainty

The range of the membership function, , of a fuzzy set A now is the interval [0,1] instead of only binary values {0,1}.

Example: Let a fuzzy set A represent the concept “real numbers that are close to 5” and the membership function for A is

2)5(101

1)(

xxA

March, 2009

®Copyright of Shun-Feng Su

15

Fuzzy Sets As A Representation for Uncertainty

For example,

When x = 62mph, M(x)=0.4667, F(x)=0.5333.

When x = 63mph, M(x)=0.5333, F(x)=0.4667.

When x = 69mph, M(x)=0.0667, F(x)=0.9333.

March, 2009

®Copyright of Shun-Feng Su

16

Uncertainty Representations

Two often used uncertainty representations: Fuzzy set and Probability. From the uncertainty concept per se viewpoint, those two uncertainties are two different types of uncertainty.

fuzzy set is to capture the idea of vagueness: To indicate the degree of uncertainty about what it is.

What is rain? What is fast?

probability is to capture the idea of ambiguity: To indicate uncertain about whether it is there.

Whether it rains? What the outcome of a die will be?

March, 2009

®Copyright of Shun-Feng Su

17

Fuzzy vs. Probability

From the mathematical representation viewpoint, they are comparable and possess different reasoning behaviors.

Reasoning with probabilities is mathematical sound but is difficult to manipulate due to no modularity.

Reasoning with fuzzy sets does not provide mathematical sound inference and is subjective, but it is easy to manipulate.

In fact, other types of uncertainty can be found in the literature.

March, 2009

®Copyright of Shun-Feng Su

18

Introductions

Fuzzy Systems Uncertainty and Its representation Fuzzy Operations and Uncertainty Reasoning Fuzzy Logic Control

Neural Networks

Genetic Algorithms

Epilogue

Outline

March, 2009

®Copyright of Shun-Feng Su

19

Operations of Fuzzy–Extension Principle

Given a function f : U V, now the input domain is a fuzzy set A in U. What will be the output?

The extension principle states that the fuzzy degree for A will be the fuzzy degree for y=f(A).

The concept is to pass the membership degree of x to f(x); i.e. the function itself is crisp and will not introduce any uncertainty. Thus, the membership degree of x will truly appear for f(x).

March, 2009

®Copyright of Shun-Feng Su

20

Extension Principle

Two problems arising:

f(x) is a many-to-one function: i.e. f(x1)=f(x2), but x1≠x2. Then, the membership degree can be μ(x1) or μ(x2). In other words, the resultant membership degree is μ(x1)μ(x2) .

The input domain consists of multiple variables. Then, f(x1, x2, …, xn) is obtained when all x1, x2, …, xn appear. In other words, the membership degree of is μ(x1)μ(x2) … μ(xn).

March, 2009

®Copyright of Shun-Feng Su

21

Extension Principle

The extension principle allows the generalization of crisp mathematical concept to the fuzzy set framework, and extends point to point mapping to mapping for fuzzy sets.

It provides a means for any function f that maps an n-tuple (x1, x2, … ,xn) in the crisp set U to a point in the crisp set V to be generalized to mapping n fuzzy subsets in U to a fuzzy subset in V.

Any mathematical relationship between non-fuzzy elements can be extended to deal with fuzzy entities.

March, 2009

®Copyright of Shun-Feng Su

22

Classic Logic Reasoning

Logic reasoning is to find other true propositions (facts) from given true propositions (knowledge and/or facts).

The scenario of logic reasoning can be interpreted as: There is a knowledge base containing facts or rules. Now, a new piece of information or the description of the current situation is specified. Then, we want to find out what the system can conclude or which action should be taken under current circumstance.

The traditional reasoning is called the Modus Ponen as (A(AB))B. That is one knowledge AB and a fact A can result in the fact B.

March, 2009

®Copyright of Shun-Feng Su

23

Approximate Reasoning for Fuzzy sets

The most used inference rule is (A1(A2B))B. In the classic logic, either A1=A2 or A1A2. Therefore, with the match and fire property, either B is concluded or B is not concluded.

But, with the use of fuzzy sets, either A1 or A2 is a fuzzy set or both. Then what can the reasoning process conclude?

Example: (speed=95Km/hr)(speed is too fastPull back the throttle)) Whether the throttle should be pulled back?

In the most common cases, A2 is a fuzzy set and A1 is a fuzzy singleton (crisp value). Note that B can be a crisp value or a fuzzy set. However, the rule A2B is hardly fuzzy.

March, 2009

®Copyright of Shun-Feng Su

24

Approximate Reasoning for Fuzzy sets

The most used reasoning format is one of the categorical reasoning called the compositional rule of inference or the generalized modus ponens.

(X is A) and (IF (X is B) then (Y is C)) results in Y is AR.

where X and Y are fuzzy variables and A, B and C are fuzzy labels (sets). Note that the resultant AR for Y is a fuzzy set. Usually, the membership function of AR can be computed as

uRA v max)( min( ( ),A u ))(),(( vut CB

March, 2009

®Copyright of Shun-Feng Su

25

Approximate Reasoning for Fuzzy sets

The above result can be viewed as the extension principle.

is to find whether v is in Y is the selection among various u (or operation max), the existence of x=u in A and (and operation min) the relation of x=u and y=v.

uRA v max)( min( ( ),A u ))(),(( vut CB

March, 2009

®Copyright of Shun-Feng Su

26

Approximate Reasoning for Fuzzy sets

Note that from the logic viewpoint, the implication pq is equivalent to pq. However, this equivalence states that the logic of pq equals pq. But in the reasoning, the logic of the implication is assumed to be true, and the question is whether the current situations (x=u and y=v) match the rule IF (X is B) then (Y is C).

Therefore, the most commonly used relation is to compute the t-norm of and .)(uB )(vC

March, 2009

®Copyright of Shun-Feng Su

27

Approximate Reasoning for Fuzzy sets

Example: R1: IF (X is A1) and (Y is B1) then (Z is C1). R2: IF (X is A2) and (Y is B2) then (Z is C2).

Now, the input is (x0, y0) and the reasoning can be graphically shown as:

March, 2009

®Copyright of Shun-Feng Su

28

Introductions

Fuzzy Systems Uncertainty and Its representation Fuzzy Operations and Uncertainty Reasoning Fuzzy Logic Control

Neural Networks

Genetic Algorithms

Epilogue

Outline

March, 2009

®Copyright of Shun-Feng Su

29

Fuzzy Logic Control

A Fuzzy Logic Controller (FLC) is a controller described by a collection of fuzzy rules (e.g. IF-THEN rules) involving linguistic variables.

The original idea for the use of fuzzy control is to incorporate “expert experience” of human into the design of controllers.

The utilization of linguistic variables, fuzzy control rules and approximate reasoning provides a means to incorporate human expert experience in designing the controller.

March, 2009

®Copyright of Shun-Feng Su

30

Rationale behind Fuzzy Logic Control

In an FLC, the rule structure provides the adaptation among strategies, and then the fuzzy mechanism provides the interpreting capability among rules.

With the interpreting capability, the transition between rules is gradual rather than abrupt. It is the so-called softening process.

But, in recent development, fuzzy control is used because it consists of multiple strategies (rules or controllers) for different situations. It of course can have better control performance than that of one complicated controller.

March, 2009

®Copyright of Shun-Feng Su

31

Basic Structure of Fuzzy Logic Control

A typical architecture of an FLC consists four principal components: a fuzzifier, a fuzzy rule base, an inference engine, and a defuzzifier.

March, 2009

®Copyright of Shun-Feng Su

32

Fuzzy Logic Control

• Knowledge usually is in a rule structure and rule structures need partition.

• Fuzzy control uses fuzzy partition.

March, 2009

®Copyright of Shun-Feng Su

33

Fuzzy Logic Control

With fuzzy partition

The consequences of all matched rules must be

transformed into actions.

To use fuzzy rules, the input values must be transferred into

fuzzy labels.

March, 2009

®Copyright of Shun-Feng Su

34

Fuzzy Logic Control

With fuzzy partition

The consequences of all matched rules must be

transformed into actions.

To use fuzzy rules, a value

must be defined into labels.

also referred to as a fuzzy system.

March, 2009

®Copyright of Shun-Feng Su

35

Basic Structure of Fuzzy Logic Control

The fuzzifier is to transform crisp measured data (e.g., speed=100Km/hr) into suitable linguistic labels (e.g. speed is too fast).

The fuzzy rule base stores the knowledge in rule forms about how to control the system to be controlled (e.g., IF “speed is too low” THEN “increase the throttle setting”).

March, 2009

®Copyright of Shun-Feng Su

36

Basic Structure of Fuzzy Logic Control

The inference engine is to infer desired control strategies from rules by performing approximate reasoning based on current states.

The defuzzifier is to yield a non-fuzzy action or decision from the inferred control strategy (a fuzzy set) by the inference engine.

March, 2009

®Copyright of Shun-Feng Su

37

Fuzzy Systems

Mamdani fuzzy rules :

If (X is A) and (Y is B) … then (Z is C)

Note that C is a fuzzy set.

TSK (in modeling) or TS (in control) fuzzy rules :

If (X is A) and (Y is B) … then Z=f(X,Y).

Now, f() is a crisp function.

March, 2009

®Copyright of Shun-Feng Su

38

Fuzzy Systems

Mamdani fuzzy rules : If (X is A) and (Y is B) … then (Z is C)

TSK (in modeling) or TS (in control) fuzzy rules : If (X is A) and (Y is B) … then Z=f(X,Y).

The approximate reasoning for the output of a fuzzy rule is obtained from extension principle as:

min( , ).u

RA v max)( )(uA ))(),(( vut CB

March, 2009

®Copyright of Shun-Feng Su

39

Fuzzy Systems

Mamdani fuzzy rules : COA defuzzification

To find the center of the area, it need to use

numerical integration.

March, 2009

®Copyright of Shun-Feng Su

40

Fuzzy Systems

TS fuzzy rules : Somewhat is also called COA.

But without numerical integration. It is obtained as

z= ,

where and are the firing strength and the fired result for the i-th rule and m is the rule number.

m

ii

m

iii f

1

1

i if

Simple and easy to calculate. Most importantly, it can be used in any mathematical

operations, such as derivative.

March, 2009

®Copyright of Shun-Feng Su

41

Fuzzy Systems

Thus, it can be found that in recent development, most of approaches consider TS (or TSK) fuzzy models.

TS fuzzy models have also another advantage in applications. The output of a TS fuzzy model system can be more sensitive to the changes of the inputs.

It can eliminate the chattering effects in the final control stage occurring in the use of traditional fuzzy models (Mamdani fuzzy rules).

March, 2009

®Copyright of Shun-Feng Su

42

Fuzzy System

A fuzzy approximator is constructed by a set of fuzzy rules as

Generally, is a fuzzy singleton.

In the literature, this fuzzy model can be said to be a Mamdani fuzzy model (with singleton fuzzy sets) and a TS fuzzy model (a crisp function).

Ml

yAxAxR lF

lnn

ll

,,2,1for

, is THEN is and , and , is IF: 11

l

A commonly-used fuzzy model in control

March, 2009

®Copyright of Shun-Feng Su

43

Fuzzy System

A fuzzy approximator is constructed by a set of fuzzy rules as

Generally, is a fuzzy singleton (TS fuzzy model).

Ml

yAxAxR lF

lnn

ll

,,2,1for

, is THEN is and , and , is IF: 11

l

To me, due to no numerical integration needed, it is a TS fuzzy model. Also, no membership functions are used in the

consequences.

March, 2009

®Copyright of Shun-Feng Su

44

Fuzzy System

The fuzzy systems with the center-of area like defuzzification and product inference can be obtained as

It is a universal function approximator and is written as .

M

l

n

iiA

M

l

n

iiA

l

f

x

x

y

li

li

1 1

1 1

))((

))((

)(

x

ωθθx Tfy )(

t-norm operation for all premise

parts

March, 2009

®Copyright of Shun-Feng Su

45

Fuzzy System

The fuzzy systems with the center-of area like defuzzification and product inference can be obtained as

It is a universal function approximator and is written as .

M

l

n

iiA

M

l

n

iiA

l

f

x

x

y

li

li

1 1

1 1

))((

))((

)(

x

ωθθx Tfy )( This is what is used

in adaptive fuzzy control.

Simple and differentiable.

Note that is a function of states.

March, 2009

®Copyright of Shun-Feng Su

46

Fuzzy System

It should be noted that the above system is a nonlinear system. But, it can be seen that the form is virtually linear.

Thus, various approaches have been proposed to handle nonlinear systems by using the linear system techniques for the linear property bearing in each rule, such as common P stability, LMI design process, adaptive fuzzy control, etc.

March, 2009

®Copyright of Shun-Feng Su

47

Introductions

Fuzzy Systems

Neural Networks Machine Learning Neural Network Models Leaning Analysis

Genetic Algorithms

Epilogue

Outline

March, 2009

®Copyright of Shun-Feng Su

48

Why need Learning

The problem domain knowledge for the complicated system usually does not exist or is extremely difficult to obtain.

The system may be asked to learn knowledge from experience by itself.

Note that learning is an important capability for an intelligent system, but not necessary.

It can be seen in the recent research, most intelligent systems have been equipped with the learning capability.

March, 2009

®Copyright of Shun-Feng Su

49

What is Learning?

There are two important definitions for learning:

H. Simon defined learning as – “any change in a system that allows it to perform better the second time on the repetition of the same task or on another task drawn from the same population.”

B. Kosko defined learning as change in all cases. “A system learns if and only if the system parameter vector or matrix has a nonzero time derivative.”

March, 2009

®Copyright of Shun-Feng Su

50

Concept of Machine Learning

The first definition is to ask the system with learning should always behave better as learning continues.

The second definition is mainly for numerical learning.

The fundamental problem for learning is how to change the system to make the

system’s behaviors as required.

March, 2009

®Copyright of Shun-Feng Su

51

Concept of Machine Learning

The first definition is to ask the system with learning should always behave better as learning continues.

The second definition is mainly for numerical learning.

The fundamental problem for learning is how to change the system to make the

system’s behaviors as required.

so-called learning algorithms

March, 2009

®Copyright of Shun-Feng Su

52

Symbolic Learning vs. Numerical Learning

In a symbolic learning scheme, the representation of knowledge is symbolic, such as the predicate calculus and rules. The learning behavior is to build a conceptual relationship between those symbols from learned examples.

In a numerical learning scheme, the knowledge somehow is coded into numerical data. The learning behavior is concerned about changing the values of parameters numerically.

March, 2009

®Copyright of Shun-Feng Su

53

Symbolic Learning

Examples of symbolic learning schemes: Inductive Learning, Case-based Learning, Explanation-based Learning, etc.

Symbolic learning is well suited to interact with human experts, but very sensitive to noise.

The major drawback of this learning is that the knowledge manipulation is very complicated.

Traditional artificial intelligence has been focused on symbolic learning. However, due to the difficulty in manipulation and sensitive to noise, symbolic learning actually did not provide any significant advances in the real-world applications.

March, 2009

®Copyright of Shun-Feng Su

54

Numerical Learning

Examples of numerical learning schemes: Neural Networks, Cerebellar Model Arithmetic Computer (CMAC), Fuzzy Modeling, etc.

Numerical learning is computational efficiency and insensitive to noise, but incomprehensible. It is easy to use but is difficult to incorporate expert knowledge.

Recently, due to the use of neural networks and fuzzy systems, numerical learning has drawn more attentions .

Applications of numerical learning schemes can be found in various disciples, such as artificial intelligence, computer science, control engineering, decision theory, expert systems, operation research, pattern recognition, and robotics.

March, 2009

®Copyright of Shun-Feng Su

55

Concept of Learning

Depending on what type of information used in determining how to change the system, learning schemes are usually categorized into three different kinds of learning;

supervised learning,

unsupervised learning and reinforcement learning. Learning category

Reinforcement learning sometimes, is also said to be supervised learning, but with less introductive supervising.

March, 2009

®Copyright of Shun-Feng Su

56

Concept of Learning

In fact, most successful learning approaches is of supervised learning due to its simplicity in the required task.

Unsupervised learning is used for finding common features or for clustering. (self-organizing)

Reinforcement learning is fantastical in ideas, but due to its intricacy in learning (such as delay reward, decoupling between two learning systems), more study must be conducted.

Learning category

March, 2009

®Copyright of Shun-Feng Su

57

Introductions

Fuzzy Systems

Neural Networks Machine Learning Neural Network Models Leaning Analysis

Genetic Algorithms

Epilogue

Outline

March, 2009

®Copyright of Shun-Feng Su

58

Introduction of Neural Networks

Artificial neural networks (ANN) or in simple, neural networks (NN) are systems that are inspired by modeling networks of biological neurons in the brain.

NN are a promising new generation of information processing systems that demonstrate the ability to learn, recall, and generalize from training patterns or data.

March, 2009

®Copyright of Shun-Feng Su

59

Typical Biological Neuron and Its Model

March, 2009

®Copyright of Shun-Feng Su

60

Introduction of Neural Networks

NN have a large number of highly interconnected processing elements (PE) or neurons that usually operate in parallel.

NN are good at tasks such as pattern matching and pattern classification, function approximation, optimization, vector quantization, and data clustering. However, traditional computers are faster in algorithmic computational tasks and precise arithmetic operations.

March, 2009

®Copyright of Shun-Feng Su

61

Introduction of Neural Networks

Since neural networks do not use a mathematical model of how a system’s output depends on its input (so-called model-free estimator), neural network architectures can be applied to a wide variety of problems.

Like Brains, neural networks recognize patterns we cannot define. This is the property of recognition without definition.

March, 2009

®Copyright of Shun-Feng Su

62

Introduction of Neural Networks

An NN is a parallel distributed information-processing structure with the following characteristics:

- It is a neurally inspired mathematical model. - It consists of a large number of highly

interconnected processing elements (neurons).

- Its connections (weights) hold the knowledge.

March, 2009

®Copyright of Shun-Feng Su

63

Introduction of Neural Networks

- A neuron can dynamically respond to its stimulus, and the response completely depends on its local information.

- It has the ability to learn, recall, and generalize from training data by assigning or adjusting the connection weights.

- Its collective behavior demonstrates the computational power, and no single neuron carries specific information (distributed representation property).

March, 2009

®Copyright of Shun-Feng Su

64

Basic Models of Neural Networks

Models of ANNs are specified by three basic entities:

1. Neuron Models: It describes how the neurons process the input and how the output is generated.

2. Connectivity: It defines how those neurons are interconnected.

3. Learning Algorithms: It defines how the connecting weights are updated to adjust the networks so as to behavior as required.

March, 2009

®Copyright of Shun-Feng Su

65

Basic Models of Neural Networks

The processing in a neuron is separated into two parts: input and output.

Associated with the input of a neuron is an integration function f, which serves to combine information, activation, or evidence from an external source or other neurons into a net-input to the neuron.

The most commonly used integration function is linear and written as:

for i=1, 2, … , n

where is the threshold of the i-th neuron.

if

m

jijiji xwnet

1

i

March, 2009

®Copyright of Shun-Feng Su

66

Basic Models of Neural Networks

The output function of a neuron is usually called the activation function in that the output of a neuron serves the role of activation of the meaning stored in the neuron.

March, 2009

®Copyright of Shun-Feng Su

67

Learning Rules for Neural Networks

March, 2009

®Copyright of Shun-Feng Su

68

Learning in Neural Networks

As we have mentioned, the basic characteristic of ANNs is that they have the capability of learning.

Iterative learning procedures are used for a variety of ANN architectures.

Learning in ANNs can be accomplished in several ways: establishment of connections between neurons; adjustment of the weight values on the links; adjustment of threshold values in neurons.

In fact, these processes can all be considered as the adjustment of weight values on the links.

March, 2009

®Copyright of Shun-Feng Su

69

Learning in Neural Networks

The backpropagation (BP) learning algorithm is usually applied for learning. Such networks are also referred as backpropagation networks.

The fundamental idea is that when a cost function

E(w) is defined such as E(w)= ,

then the updating algorithm is w=- E(w) or

w=- = .

Since the above process is to update the weights after all training patterns are taken into account, this is kind of learning is called the batch learning.

( ) ( ) 2

1

1( )

2

pk k

k

d y

w

jw

E

( )

( ) ( )

1

( )kp

k k

k j

yd y

w

March, 2009

®Copyright of Shun-Feng Su

70

Learning in Neural Networks

It can be found that when the batch learning is used, the errors of all training patterns are summed together and then the learning effects are for the summary of all training patterns. Thus, the learning cannot make adjustments for individual pattern and the resultant learning is usually unacceptable.

The other kind of learning is called the on-line learning or per-example learning. In this type of learning, these changes are made individually for each pattern; i.e.,

( )( ) ( )( )

kk k

j

yd y

w

March, 2009

®Copyright of Shun-Feng Su

71

Learning in Neural Networks

When we want an NN to perform some tasks, the NN is realized by finding an appropriate set of weights. In other words, the obtained weights are to capture what we want the NN to be or the knowledge.

The activation values of neurons represent the system at some time snap. Thus, they capture the transition state for some specific input set at a certain time spot.

From the information storage viewpoint, the weights of the links encode the so-called long-term memory and the activation states of neurons encode the short-term memory in the NN.

March, 2009

®Copyright of Shun-Feng Su

72

Introductions

Fuzzy Systems

Neural Networks Machine Learning Neural Network Models Leaning Analysis

Genetic Algorithms

Epilogue

Outline

March, 2009

®Copyright of Shun-Feng Su

73

Universal Approximator Theorem

Neural network with as few as one hidden layer using arbitrary squashing activation function and linear or polynomial integration function can approximate virtually any function of interest to any desired degree of accuracy, provided sufficiently many hidden neurons are available.

Any lack of success in applications may arise from inadequate learning, insufficient number of hidden neurons or lack of deterministic relationships between inputs and desired outputs.

The theorem only stated the existence of the ideal network, but does not provide any mechanism to find it.

March, 2009

®Copyright of Shun-Feng Su

74

Learning Performance Analysis

Two types of learning phases must be distinguished in the evaluation of learning performance, especially for offline learning schemes: the training phase and the testing phase.

In the training phase, the system is trained by the given training patterns. Thus, in the training phase, the system is under construction and the convergent behavior of the training is concerned.

For the training performance, the convergent behavior is concerned and it is simple to consider the learning histories (training errors vs. training iterations).

March, 2009

®Copyright of Shun-Feng Su

75

Learning Performance Analysis

The learning convergent behaviors usually are characterized by two properties: the convergent speed and the converged error (training error).

If the system is offline learning scheme, the convergent speed may not be a significant factor to be considered.

An issue for the converged errors is the learning may be stuck on local minima if iterative (incremental) learning algorithms are used.

March, 2009

®Copyright of Shun-Feng Su

76

Learning Performance Analysis

Even through the learning algorithms are the major factor in determining the convergent behavior, other factors, such as the system structure, the training data quality, etc., may also affect the training performance.

The learning performance of the training phase is to state how accurately the learned system can approximate the desired outputs for a given input in the training data set.

The purpose of learning is to obtain a system that after learning can somehow have a fair chance to behave as required for any input data or in short, to generalize.

March, 2009

®Copyright of Shun-Feng Su

77

Learning Performance Analysis

Thus, in the testing phase, the generalization capability is concerned; that is, whether the learned system can interpret those unlearned patterns well.

In the testing phase, the learned system is tested by another set of patterns, which are not used in the training phase in any way, to define the generalization errors.

The performance in this phase is usually referred to as the generalization capability.

March, 2009

®Copyright of Shun-Feng Su

78

Validation of Generalization

There are several methods for estimating generalization errors:

Split-sample validation: To randomly select part of the data as a test set, which must not used in any way during training. (The most commonly used one).

Cross-validation: To resample the training data set. In a k-fold cross-validation, the data is divided into k subsets with equal size. Then, the network is trained k times, each time leaving out one of those subsets, but using only the omitted subset to compute the error criterion. It is also called “leave-one-out” cross-validation.

Bootstrapping: Instead of repeating subsets of the data, sub-samples are randomly drawn from the data. It seems to work better than cross-validation.

March, 2009

®Copyright of Shun-Feng Su

79

Learning Performance Analysis

In general, there are two different types of generalization: interpolation and extrapolation.

Interpolation can often be done reliably, but extrapolation is notoriously unreliable.

Note that generalization is not always possible for various learning systems despite the assertions in the literature.

March, 2009

®Copyright of Shun-Feng Su

80

Learning Performance Analysis

There are three conditions that are typically necessary (although not sufficient for good generalization).

Deterministic input-output relationships: The inputs to the network contain sufficient information pertaining to the desired outputs. It is impossible to learn a non-existent function.

Smooth functions: A small change in the inputs should produce a small change in the outputs. Very non-smooth functions (e.g. random noise) cannot be generalized.

Sufficient training data: The used training data should be a sufficiently large and representative subset of the population. Sufficient data can avoid extrapolation.

March, 2009

®Copyright of Shun-Feng Su

81

Overfitting and Underfitting

A system that is not sufficiently complex (i.e., parameters to be tuned are less than required) may fail to detect fully the signal in a complicated data set, leading to underfitting.

A network that is too complex may fit not only the signal but also the noise, leading to overfitting.

Note that overfitting may occur even with noise-free data.

There are various approaches proposed in the literature jittering, weight decay, early stooping, Bayesian learning, robust learning algorithms, etc. .

March, 2009

®Copyright of Shun-Feng Su

82

Local Learning Concept

The minimum disturbance principle suggests that a better way of learning should be aimed at not only reducing the output error for the current training pattern but also minimizing disturbance to the weights having already learned.

A learning system following the minimum disturbance principle can learns more effective. We refer it as the local learning concept.

March, 2009

®Copyright of Shun-Feng Su

83

Local Learning Concept

The updating effects of neural networks are prevailed to all weights in the networks due to the distributed knowledge representation. It violates the minimum disturbance principle. It is called the global learning.

Local learning can be more effective, but may not always learn better.

Neural fuzzy systems use spatial relations to define learning structure that can facilitate local learning concept.

March, 2009

®Copyright of Shun-Feng Su

84

Network Structure for Fuzzy Systems

In this kind of approach, fuzzy models are characterized by a set of parameters, such as the centers and widths in membership functions, the rule relationships, etc.

Since those parameters can be viewed as the weights in a network, the traditional learning schemes for neural networks then can be adopted to this fuzzy modeling problem.

Those kinds of approaches are often referred as neural fuzzy systems or neural-network-based fuzzy systems.

March, 2009

®Copyright of Shun-Feng Su

85

Local Learning Concept

It can be found that neural fuzzy systems can always have better learning capability than that of neural networks.

Since local learning may restrain the learning on the pre-defined relations to reduce the learning burden, if those relations are not correct or cannot reflect certain information, the effects on local learning may not be acceptable.

Several systems can be classified as local learning systems, such as radial basis function networks, Wavelet networks, CMAC, etc.

March, 2009

®Copyright of Shun-Feng Su

86

Introductions

Fuzzy Systems

Neural Networks

Genetic Algorithms Optimization in Computational Intelligence Evolutionary Computation Other Non-derivation Optimization

Epilogue

Outline

March, 2009

®Copyright of Shun-Feng Su

87

Optimization in Computational Intelligence

Optimization processes are required in an intelligent system due to:

Better selection of applicable knowledge or strategies can result in better performance;

In the learning process, an optimal way of defining the updating rule is required.

In general, an optimization problem requires finding a setting of variable vector of the system such that a certain quality criterion or called a performance function is optimized. Sometimes, the variable vector may have to satisfy some constraints.

March, 2009

®Copyright of Shun-Feng Su

88

Optimization in Computational Intelligence

The traditional optimization approaches are to develop a formal model that resembles the original function and then solves it by means of traditional mathematical methods .

Evolutionary algorithms have been widely used in various intelligent systems. In fact, by combining with fuzzy systems and networks, lots of applications can be found in the literature.

March, 2009

®Copyright of Shun-Feng Su

89

Evolutionary Computation

An important property of evolutionary algorithms in search is that in the search process, auxiliary forms of the fitness function, such as derivations, are not required.

In fact, evolutionary computation should be understood as a general adaptable concept for problem solving rather than a collection of related and ready-to-use algorithms.

March, 2009

®Copyright of Shun-Feng Su

90

Introductions

Fuzzy Systems

Neural Networks

Genetic Algorithms Optimization in Computational Intelligence Evolutionary Computation Other Non-derivation Optimization

Epilogue

Outline

March, 2009

®Copyright of Shun-Feng Su

91

Evolutionary Computation

The majority of current implementations of evolutionary algorithms descend from three strongly related but independent developed approaches:

Genetic algorithms: to use binary as gene in its representation to search for an optimal chromosome.

Evolutionary programming: to evolve finite state machines to predict events on the basis of former observations.

Evolution strategies: to solve difficult discrete and continuous parameter optimization problems.

March, 2009

®Copyright of Shun-Feng Su

92

Evolutionary Computation

Evolutionary computation is to mimic the natural selection process so as to find the best fitted candidate for the solution. (Optimization)

Evolutionary algorithms can be viewed as optimization approaches that use random search algorithms with some guidance.

March, 2009

®Copyright of Shun-Feng Su

93

Evolutionary Computation

The guidance is fulfilled by a user-specified fitness function.

In general, an optimization problem requires finding a setting of variable vector of the system such that a certain quality criterion or called a performance function is optimized. Sometimes, the variable vector may have to satisfy some constraints.

March, 2009

®Copyright of Shun-Feng Su

94

Apply reproduction and crossover on P(t) to yield C(t)

Apply mutation on C(t) to yield and then evaluate D(t)

Select P(t+1) from P(t) and D(t) based on the fitness

Initialize population P(t)

Evaluate P(t)

Stop criterion satisfied ?

Stop

March, 2009

®Copyright of Shun-Feng Su

95

Evolutionary Computation

Evolutionary computation uses three basic operators to manipulate the genetic composition (chromosomes) of a population:

Reproduction is a process of selecting parents for generating offspring. The most highly rated chromosomes in the current generation are most likely copied in the new generation.

Crossover provides a mechanism for chromosomes to mix and match attributes through random processes.

Mutation is to changed attributes (genes) in the new generation to bring new possibility. Mutation is a very important mechanism in avoiding local minimum in optimization search.

March, 2009

®Copyright of Shun-Feng Su

96

Evolutionary Computation

The above operations play the role of generating the new chromosomes for evolution. Hopefully, the best-fitted solution can be generated.

Besides, randomness play the essential roles in those operations.

One attractive property of evolutionary algorithms is that the performance of the solution is always getting better.

March, 2009

®Copyright of Shun-Feng Su

97

Evolutionary Computation

However, due to the nature of adaptation to the problems, the operations of evolutionary algorithms must be designed by the users.

Moreover, if the optimization is constrained, the initial population and the generations of new chromosomes must be carefully selected.

March, 2009

®Copyright of Shun-Feng Su

98

Introductions

Fuzzy Systems

Neural Networks

Genetic Algorithms Optimization in Computational Intelligence Evolutionary Computation Other Non-derivation Optimization

Epilogue

Outline

March, 2009

®Copyright of Shun-Feng Su

99

Other Non-derivation Optimization

Other often mentioned approaches are Ants (ACS, ACO, etc) and Particle Swarm Optimization (PSO).

The overall ideas are all similar in that they all use fitness values to guide the search with some random mechanisms associated with the search process.

Usually, these approaches can have better search performance than that of genetic algorithms.

March, 2009

®Copyright of Shun-Feng Su

100

Other Non-derivation Optimization

It is because Genetic algorithms are solution-wise search and swarm search algorithms are component-wise search.

Also, it can be found that genetic algorithms are easier to be trapped into a local minimum if the initial population has some local optimum properties.

Swarm algorithms can easily escape from such an initial local optimum phenomena.

March, 2009

®Copyright of Shun-Feng Su

101

Epilogue

Computation intelligence is a new vehicle for the next generation of artificial intelligence.

Nevertheless, only computational intelligence can bring you nowhere.

To incorporate with other techniques may possibly create new frontiers for our dreams.

March, 2009

®Copyright of Shun-Feng Su

102

Thank you for your

attention!

Any Questions ?!

Shun-Feng Su,Professor of Department of Electrical Engineering,

National Taiwan University of Science and Technology

E-mail: [email protected],