march, 2009 ® copyright of shun-feng su 1 the essence of computational intelligence...

March, 2009

®Copyright of Shun-Feng Su

1

The Essence of Computational Intelligence

計算型智慧的基本概念

Offered by 蘇順豐Shun-Feng Su,

E-mail: [email protected]

Department of Electrical Engineering,National Taiwan University of Science and Technology

March, 2009


2

Preface

People always dreams of having machines that can act like human.

Artificial Intelligence is to study what are those components that can facilitate such a dream.

Due to the nature of knowledge, traditional artificial intelligence use symbols to construct the conceptual world.

March, 2009


3

Preface

Symbolic artificial intelligence is very difficult to manipulate for a real world problem, especially, for implementing common sense knowledge.

Recently, computational intelligence (CI) is commonly used and has demonstrated good performance in various applications.

CI is named to distinguish itself from the traditional symbolic artificial intelligence in the property of easy manipulation with the use of numerical knowledge representation.

March, 2009


4

Preface

The following three methodologies are often considered as CI:

Fuzzy Systems, Neural Networks, and Genetic Algorithms (or referred to as

Evolutionary Computation. )

This talk is to provide fundamental concepts and ideas in those often mentioned techniques.

March, 2009


5

Basics for CI

CI is known to have the following characteristics [1]:

Numerical knowledge representation; Adaptability; Fault tolerance; Fast processing speed ; Error rate optimality.

[1] J. C. Bezdek, “what is computational intelligence?” Computational Intelligence: Imitating Life, J. M. Zurada, R. J. Marks II, and C. J. Robinson, Eds., New York: IEEE Press, pp. 1-12, 1994.

March, 2009


6

Basics for CI

Possible advantages of using CI are:

Efficiency; Robustness; Good generalization capability; Easy to use; Easy to incorporate problem domain heuristics; Superior performance in various applications.

March, 2009


7

Basics for CI

Possible advantages of using CI are:

Efficiency; Robustness; Good generalization capability; Easy to use; Easy to incorporate problem domain heuristics; Superior performance in various applications.

Generation capability is to have a fair chance to behave as required

for any input data.

March, 2009


8

Basics for CI

Possible problems encountered while using CI are:

Incomprehensive in knowledge;Lack of theoretical analysis tools, such as

stability, performance guarantee, etc.; Various subjective parameters required; Lack of benchmarks in performance evaluation.

May be disadvantages, but sometimes, may provide good means for applications.

March, 2009


9

Introductions

Fuzzy Systems Uncertainty and Its representation Fuzzy operations and Uncertainty Reasoning Fuzzy Logic Control

Neural Networks

Genetic Algorithms

Epilogue

Outline

March, 2009


10

Introduction of Fuzzy Systems

Fuzzy systems have been widely used in various applications.

In fact, the fundamental idea behind fuzzy systems is to include uncertainty in the process.

Such an inclusion provides extra information so that the systems can be more accurate.

In other words, fuzzy is vagueness by meaning, but can provides accurate due to this extra information.

March, 2009


11

Uncertainties in Intelligent systems

Uncertainties exists for the following reasons:

noise always exists in the environment;facts being true or events occurring may not be

certain;stored knowledge is incomplete or liable to change;exceptions are inevitable for any realistic

knowledge;simplifications are necessary to reduce the

complexity of the system;partitions of continuous variables for rule-based

knowledge results in fuzzy set concept.

March, 2009


12

Uncertainties in Intelligent systems

Traditional systems always use nominal values to reason and to make decision. But, to use more information may have more accurate decision making.

Thus, to act intelligently, those uncertainties cannot be ignored in the way of computing.

To incorporate uncertainties in the decision making process, the system must be capable of representing uncertainty and also be equipped with the capability of approximate reasoning.

March, 2009


13

Fuzzy Sets As A Representation for Uncertainty

The traditional sets are called classical sets or crisp sets.

In a crisp set, the membership belonging is crisp and can be described in a simple yes/no answer. That is, an element is either in the set or not in the set. The membership function of A is defined as

.,0

,,1)(

Axwhen

AxwhenxA

March, 2009


14


The range of the membership function, , of a fuzzy set A now is the interval [0,1] instead of only binary values {0,1}.

Example: Let a fuzzy set A represent the concept “real numbers that are close to 5” and the membership function for A is

2)5(101

1)(

xxA

March, 2009


15


For example,

When x = 62mph, M(x)=0.4667, F(x)=0.5333.

When x = 63mph, M(x)=0.5333, F(x)=0.4667.

When x = 69mph, M(x)=0.0667, F(x)=0.9333.

March, 2009


16

Uncertainty Representations

Two often used uncertainty representations: Fuzzy set and Probability. From the uncertainty concept per se viewpoint, those two uncertainties are two different types of uncertainty.

fuzzy set is to capture the idea of vagueness: To indicate the degree of uncertainty about what it is.

What is rain? What is fast?

probability is to capture the idea of ambiguity: To indicate uncertain about whether it is there.

Whether it rains? What the outcome of a die will be?

March, 2009


17

Fuzzy vs. Probability

From the mathematical representation viewpoint, they are comparable and possess different reasoning behaviors.

Reasoning with probabilities is mathematical sound but is difficult to manipulate due to no modularity.

Reasoning with fuzzy sets does not provide mathematical sound inference and is subjective, but it is easy to manipulate.

In fact, other types of uncertainty can be found in the literature.

March, 2009


18

Introductions

Fuzzy Systems Uncertainty and Its representation Fuzzy Operations and Uncertainty Reasoning Fuzzy Logic Control

Neural Networks

Genetic Algorithms

Epilogue

Outline

March, 2009


19

Operations of Fuzzy–Extension Principle

Given a function f : U V, now the input domain is a fuzzy set A in U. What will be the output?

The extension principle states that the fuzzy degree for A will be the fuzzy degree for y=f(A).

The concept is to pass the membership degree of x to f(x); i.e. the function itself is crisp and will not introduce any uncertainty. Thus, the membership degree of x will truly appear for f(x).

March, 2009


20

Extension Principle

Two problems arising:

f(x) is a many-to-one function: i.e. f(x1)=f(x2), but x1≠x2. Then, the membership degree can be μ(x1) or μ(x2). In other words, the resultant membership degree is μ(x1)μ(x2) .

The input domain consists of multiple variables. Then, f(x1, x2, …, xn) is obtained when all x1, x2, …, xn appear. In other words, the membership degree of is μ(x1)μ(x2) … μ(xn).

March, 2009


21

Extension Principle

The extension principle allows the generalization of crisp mathematical concept to the fuzzy set framework, and extends point to point mapping to mapping for fuzzy sets.

It provides a means for any function f that maps an n-tuple (x1, x2, … ,xn) in the crisp set U to a point in the crisp set V to be generalized to mapping n fuzzy subsets in U to a fuzzy subset in V.

Any mathematical relationship between non-fuzzy elements can be extended to deal with fuzzy entities.

March, 2009


22

Classic Logic Reasoning

Logic reasoning is to find other true propositions (facts) from given true propositions (knowledge and/or facts).

The scenario of logic reasoning can be interpreted as: There is a knowledge base containing facts or rules. Now, a new piece of information or the description of the current situation is specified. Then, we want to find out what the system can conclude or which action should be taken under current circumstance.

The traditional reasoning is called the Modus Ponen as (A(AB))B. That is one knowledge AB and a fact A can result in the fact B.

March, 2009


23

Approximate Reasoning for Fuzzy sets

The most used inference rule is (A1(A2B))B. In the classic logic, either A1=A2 or A1A2. Therefore, with the match and fire property, either B is concluded or B is not concluded.

But, with the use of fuzzy sets, either A1 or A2 is a fuzzy set or both. Then what can the reasoning process conclude?

Example: (speed=95Km/hr)(speed is too fastPull back the throttle)) Whether the throttle should be pulled back?

In the most common cases, A2 is a fuzzy set and A1 is a fuzzy singleton (crisp value). Note that B can be a crisp value or a fuzzy set. However, the rule A2B is hardly fuzzy.

March, 2009


24


The most used reasoning format is one of the categorical reasoning called the compositional rule of inference or the generalized modus ponens.

(X is A) and (IF (X is B) then (Y is C)) results in Y is AR.

where X and Y are fuzzy variables and A, B and C are fuzzy labels (sets). Note that the resultant AR for Y is a fuzzy set. Usually, the membership function of AR can be computed as

uRA v max)( min( ( ),A u ))(),(( vut CB

March, 2009


25


The above result can be viewed as the extension principle.

is to find whether v is in Y is the selection among various u (or operation max), the existence of x=u in A and (and operation min) the relation of x=u and y=v.

uRA v max)( min( ( ),A u ))(),(( vut CB

March, 2009


26


Note that from the logic viewpoint, the implication pq is equivalent to pq. However, this equivalence states that the logic of pq equals pq. But in the reasoning, the logic of the implication is assumed to be true, and the question is whether the current situations (x=u and y=v) match the rule IF (X is B) then (Y is C).

Therefore, the most commonly used relation is to compute the t-norm of and .)(uB )(vC

March, 2009


27


Example: R1: IF (X is A1) and (Y is B1) then (Z is C1). R2: IF (X is A2) and (Y is B2) then (Z is C2).

Now, the input is (x0, y0) and the reasoning can be graphically shown as:

March, 2009


28

Introductions

Fuzzy Systems Uncertainty and Its representation Fuzzy Operations and Uncertainty Reasoning Fuzzy Logic Control

Neural Networks

Genetic Algorithms

Epilogue

Outline

March, 2009


29

Fuzzy Logic Control

A Fuzzy Logic Controller (FLC) is a controller described by a collection of fuzzy rules (e.g. IF-THEN rules) involving linguistic variables.

The original idea for the use of fuzzy control is to incorporate “expert experience” of human into the design of controllers.

The utilization of linguistic variables, fuzzy control rules and approximate reasoning provides a means to incorporate human expert experience in designing the controller.

March, 2009


30

Rationale behind Fuzzy Logic Control

In an FLC, the rule structure provides the adaptation among strategies, and then the fuzzy mechanism provides the interpreting capability among rules.

With the interpreting capability, the transition between rules is gradual rather than abrupt. It is the so-called softening process.

But, in recent development, fuzzy control is used because it consists of multiple strategies (rules or controllers) for different situations. It of course can have better control performance than that of one complicated controller.

March, 2009


31

Basic Structure of Fuzzy Logic Control

A typical architecture of an FLC consists four principal components: a fuzzifier, a fuzzy rule base, an inference engine, and a defuzzifier.

March, 2009


32

Fuzzy Logic Control

• Knowledge usually is in a rule structure and rule structures need partition.

• Fuzzy control uses fuzzy partition.

March, 2009


33

Fuzzy Logic Control

With fuzzy partition

The consequences of all matched rules must be

transformed into actions.

To use fuzzy rules, the input values must be transferred into

fuzzy labels.

March, 2009


34

Fuzzy Logic Control

With fuzzy partition

The consequences of all matched rules must be

transformed into actions.

To use fuzzy rules, a value

must be defined into labels.

also referred to as a fuzzy system.

March, 2009


35


The fuzzifier is to transform crisp measured data (e.g., speed=100Km/hr) into suitable linguistic labels (e.g. speed is too fast).

The fuzzy rule base stores the knowledge in rule forms about how to control the system to be controlled (e.g., IF “speed is too low” THEN “increase the throttle setting”).

March, 2009


36


The inference engine is to infer desired control strategies from rules by performing approximate reasoning based on current states.

The defuzzifier is to yield a non-fuzzy action or decision from the inferred control strategy (a fuzzy set) by the inference engine.

March, 2009


37

Fuzzy Systems

Mamdani fuzzy rules :

If (X is A) and (Y is B) … then (Z is C)

Note that C is a fuzzy set.

TSK (in modeling) or TS (in control) fuzzy rules :

If (X is A) and (Y is B) … then Z=f(X,Y).

Now, f() is a crisp function.

March, 2009


38

Fuzzy Systems

Mamdani fuzzy rules : If (X is A) and (Y is B) … then (Z is C)

TSK (in modeling) or TS (in control) fuzzy rules : If (X is A) and (Y is B) … then Z=f(X,Y).

The approximate reasoning for the output of a fuzzy rule is obtained from extension principle as:

min( , ).u

RA v max)( )(uA ))(),(( vut CB

March, 2009


39

Fuzzy Systems

Mamdani fuzzy rules : COA defuzzification

To find the center of the area, it need to use

numerical integration.

March, 2009


40

Fuzzy Systems

TS fuzzy rules : Somewhat is also called COA.

But without numerical integration. It is obtained as

z= ,

where and are the firing strength and the fired result for the i-th rule and m is the rule number.

m

ii

m

iii f

1

1

i if

Simple and easy to calculate. Most importantly, it can be used in any mathematical

operations, such as derivative.

March, 2009


41

Fuzzy Systems

Thus, it can be found that in recent development, most of approaches consider TS (or TSK) fuzzy models.

TS fuzzy models have also another advantage in applications. The output of a TS fuzzy model system can be more sensitive to the changes of the inputs.

It can eliminate the chattering effects in the final control stage occurring in the use of traditional fuzzy models (Mamdani fuzzy rules).

March, 2009


42

Fuzzy System

A fuzzy approximator is constructed by a set of fuzzy rules as

Generally, is a fuzzy singleton.

In the literature, this fuzzy model can be said to be a Mamdani fuzzy model (with singleton fuzzy sets) and a TS fuzzy model (a crisp function).

Ml

yAxAxR lF

lnn

ll

,,2,1for

, is THEN is and , and , is IF: 11

l

A commonly-used fuzzy model in control

March, 2009


43

Fuzzy System

A fuzzy approximator is constructed by a set of fuzzy rules as

Generally, is a fuzzy singleton (TS fuzzy model).

Ml

yAxAxR lF

lnn

ll

,,2,1for

, is THEN is and , and , is IF: 11

l

To me, due to no numerical integration needed, it is a TS fuzzy model. Also, no membership functions are used in the

consequences.

March, 2009


44

Fuzzy System

The fuzzy systems with the center-of area like defuzzification and product inference can be obtained as

It is a universal function approximator and is written as .

M

l

n

iiA

M

l

n

iiA

l

f

x

x

y

li

li

1 1

1 1

))((

))((

)(

x

ωθθx Tfy )(

t-norm operation for all premise

parts

March, 2009


45

Fuzzy System

The fuzzy systems with the center-of area like defuzzification and product inference can be obtained as

It is a universal function approximator and is written as .

M

l

n

iiA

M

l

n

iiA

l

f

x

x

y

li

li

1 1

1 1

))((

))((

)(

x

ωθθx Tfy )( This is what is used

in adaptive fuzzy control.

Simple and differentiable.

Note that is a function of states.

March, 2009


46

Fuzzy System

It should be noted that the above system is a nonlinear system. But, it can be seen that the form is virtually linear.

Thus, various approaches have been proposed to handle nonlinear systems by using the linear system techniques for the linear property bearing in each rule, such as common P stability, LMI design process, adaptive fuzzy control, etc.

March, 2009


47

Introductions

Fuzzy Systems

Neural Networks Machine Learning Neural Network Models Leaning Analysis

Genetic Algorithms

Epilogue

Outline

March, 2009


48

Why need Learning

The problem domain knowledge for the complicated system usually does not exist or is extremely difficult to obtain.

The system may be asked to learn knowledge from experience by itself.

Note that learning is an important capability for an intelligent system, but not necessary.

It can be seen in the recent research, most intelligent systems have been equipped with the learning capability.

March, 2009


49

What is Learning?

There are two important definitions for learning:

H. Simon defined learning as – “any change in a system that allows it to perform better the second time on the repetition of the same task or on another task drawn from the same population.”

B. Kosko defined learning as change in all cases. “A system learns if and only if the system parameter vector or matrix has a nonzero time derivative.”

March, 2009


50

Concept of Machine Learning

The first definition is to ask the system with learning should always behave better as learning continues.

The second definition is mainly for numerical learning.

The fundamental problem for learning is how to change the system to make the

system’s behaviors as required.

March, 2009


51

Concept of Machine Learning

The first definition is to ask the system with learning should always behave better as learning continues.

The second definition is mainly for numerical learning.

The fundamental problem for learning is how to change the system to make the

system’s behaviors as required.

so-called learning algorithms

March, 2009


52

Symbolic Learning vs. Numerical Learning

In a symbolic learning scheme, the representation of knowledge is symbolic, such as the predicate calculus and rules. The learning behavior is to build a conceptual relationship between those symbols from learned examples.

In a numerical learning scheme, the knowledge somehow is coded into numerical data. The learning behavior is concerned about changing the values of parameters numerically.

March, 2009


53

Symbolic Learning

Examples of symbolic learning schemes: Inductive Learning, Case-based Learning, Explanation-based Learning, etc.

Symbolic learning is well suited to interact with human experts, but very sensitive to noise.

The major drawback of this learning is that the knowledge manipulation is very complicated.

Traditional artificial intelligence has been focused on symbolic learning. However, due to the difficulty in manipulation and sensitive to noise, symbolic learning actually did not provide any significant advances in the real-world applications.

March, 2009


54

Numerical Learning

Examples of numerical learning schemes: Neural Networks, Cerebellar Model Arithmetic Computer (CMAC), Fuzzy Modeling, etc.

Numerical learning is computational efficiency and insensitive to noise, but incomprehensible. It is easy to use but is difficult to incorporate expert knowledge.

Recently, due to the use of neural networks and fuzzy systems, numerical learning has drawn more attentions .

Applications of numerical learning schemes can be found in various disciples, such as artificial intelligence, computer science, control engineering, decision theory, expert systems, operation research, pattern recognition, and robotics.

March, 2009


55

Concept of Learning

Depending on what type of information used in determining how to change the system, learning schemes are usually categorized into three different kinds of learning;

supervised learning,

unsupervised learning and reinforcement learning. Learning category

Reinforcement learning sometimes, is also said to be supervised learning, but with less introductive supervising.

March, 2009


56

Concept of Learning

In fact, most successful learning approaches is of supervised learning due to its simplicity in the required task.

Unsupervised learning is used for finding common features or for clustering. (self-organizing)

Reinforcement learning is fantastical in ideas, but due to its intricacy in learning (such as delay reward, decoupling between two learning systems), more study must be conducted.

Learning category

March, 2009


57

Introductions

Fuzzy Systems


Genetic Algorithms

Epilogue

Outline

March, 2009


58

Introduction of Neural Networks

Artificial neural networks (ANN) or in simple, neural networks (NN) are systems that are inspired by modeling networks of biological neurons in the brain.

NN are a promising new generation of information processing systems that demonstrate the ability to learn, recall, and generalize from training patterns or data.

March, 2009


59

Typical Biological Neuron and Its Model

March, 2009


60


NN have a large number of highly interconnected processing elements (PE) or neurons that usually operate in parallel.

NN are good at tasks such as pattern matching and pattern classification, function approximation, optimization, vector quantization, and data clustering. However, traditional computers are faster in algorithmic computational tasks and precise arithmetic operations.

March, 2009


61


Since neural networks do not use a mathematical model of how a system’s output depends on its input (so-called model-free estimator), neural network architectures can be applied to a wide variety of problems.

Like Brains, neural networks recognize patterns we cannot define. This is the property of recognition without definition.

March, 2009


62


An NN is a parallel distributed information-processing structure with the following characteristics:

－ It is a neurally inspired mathematical model. － It consists of a large number of highly

interconnected processing elements (neurons).

－ Its connections (weights) hold the knowledge.

March, 2009


63


－ A neuron can dynamically respond to its stimulus, and the response completely depends on its local information.

－ It has the ability to learn, recall, and generalize from training data by assigning or adjusting the connection weights.

－ Its collective behavior demonstrates the computational power, and no single neuron carries specific information (distributed representation property).

March, 2009


64

Basic Models of Neural Networks

Models of ANNs are specified by three basic entities:

1. Neuron Models: It describes how the neurons process the input and how the output is generated.

2. Connectivity: It defines how those neurons are interconnected.

3. Learning Algorithms: It defines how the connecting weights are updated to adjust the networks so as to behavior as required.

March, 2009


65


The processing in a neuron is separated into two parts: input and output.

Associated with the input of a neuron is an integration function f, which serves to combine information, activation, or evidence from an external source or other neurons into a net-input to the neuron.

The most commonly used integration function is linear and written as:

for i=1, 2, … , n

where is the threshold of the i-th neuron.

if

m

jijiji xwnet

1

i

March, 2009


66


The output function of a neuron is usually called the activation function in that the output of a neuron serves the role of activation of the meaning stored in the neuron.

March, 2009


67

Learning Rules for Neural Networks

March, 2009


68

Learning in Neural Networks

As we have mentioned, the basic characteristic of ANNs is that they have the capability of learning.

Iterative learning procedures are used for a variety of ANN architectures.

Learning in ANNs can be accomplished in several ways: establishment of connections between neurons; adjustment of the weight values on the links; adjustment of threshold values in neurons.

In fact, these processes can all be considered as the adjustment of weight values on the links.

March, 2009


69


The backpropagation (BP) learning algorithm is usually applied for learning. Such networks are also referred as backpropagation networks.

The fundamental idea is that when a cost function

E(w) is defined such as E(w)= ,

then the updating algorithm is w=- E(w) or

w=- = .

Since the above process is to update the weights after all training patterns are taken into account, this is kind of learning is called the batch learning.

( ) ( ) 2

1

1( )

2

pk k

k

d y

w

jw

E

( )

( ) ( )

1

( )kp

k k

k j

yd y

w

March, 2009


70


It can be found that when the batch learning is used, the errors of all training patterns are summed together and then the learning effects are for the summary of all training patterns. Thus, the learning cannot make adjustments for individual pattern and the resultant learning is usually unacceptable.

The other kind of learning is called the on-line learning or per-example learning. In this type of learning, these changes are made individually for each pattern; i.e.,

( )( ) ( )( )

kk k

j

yd y

w

March, 2009


71


When we want an NN to perform some tasks, the NN is realized by finding an appropriate set of weights. In other words, the obtained weights are to capture what we want the NN to be or the knowledge.

The activation values of neurons represent the system at some time snap. Thus, they capture the transition state for some specific input set at a certain time spot.

From the information storage viewpoint, the weights of the links encode the so-called long-term memory and the activation states of neurons encode the short-term memory in the NN.

March, 2009


72

Introductions

Fuzzy Systems


Genetic Algorithms

Epilogue

Outline

March, 2009


73

Universal Approximator Theorem

Neural network with as few as one hidden layer using arbitrary squashing activation function and linear or polynomial integration function can approximate virtually any function of interest to any desired degree of accuracy, provided sufficiently many hidden neurons are available.

Any lack of success in applications may arise from inadequate learning, insufficient number of hidden neurons or lack of deterministic relationships between inputs and desired outputs.

The theorem only stated the existence of the ideal network, but does not provide any mechanism to find it.

March, 2009


74

Learning Performance Analysis

Two types of learning phases must be distinguished in the evaluation of learning performance, especially for offline learning schemes: the training phase and the testing phase.

In the training phase, the system is trained by the given training patterns. Thus, in the training phase, the system is under construction and the convergent behavior of the training is concerned.

For the training performance, the convergent behavior is concerned and it is simple to consider the learning histories (training errors vs. training iterations).

March, 2009


75


The learning convergent behaviors usually are characterized by two properties: the convergent speed and the converged error (training error).

If the system is offline learning scheme, the convergent speed may not be a significant factor to be considered.

An issue for the converged errors is the learning may be stuck on local minima if iterative (incremental) learning algorithms are used.

March, 2009


76


Even through the learning algorithms are the major factor in determining the convergent behavior, other factors, such as the system structure, the training data quality, etc., may also affect the training performance.

The learning performance of the training phase is to state how accurately the learned system can approximate the desired outputs for a given input in the training data set.

The purpose of learning is to obtain a system that after learning can somehow have a fair chance to behave as required for any input data or in short, to generalize.

March, 2009


77


Thus, in the testing phase, the generalization capability is concerned; that is, whether the learned system can interpret those unlearned patterns well.

In the testing phase, the learned system is tested by another set of patterns, which are not used in the training phase in any way, to define the generalization errors.

The performance in this phase is usually referred to as the generalization capability.

March, 2009


78

Validation of Generalization

There are several methods for estimating generalization errors:

Split-sample validation: To randomly select part of the data as a test set, which must not used in any way during training. (The most commonly used one).

Cross-validation: To resample the training data set. In a k-fold cross-validation, the data is divided into k subsets with equal size. Then, the network is trained k times, each time leaving out one of those subsets, but using only the omitted subset to compute the error criterion. It is also called “leave-one-out” cross-validation.

Bootstrapping: Instead of repeating subsets of the data, sub-samples are randomly drawn from the data. It seems to work better than cross-validation.

March, 2009


79


In general, there are two different types of generalization: interpolation and extrapolation.

Interpolation can often be done reliably, but extrapolation is notoriously unreliable.

Note that generalization is not always possible for various learning systems despite the assertions in the literature.

March, 2009


80


There are three conditions that are typically necessary (although not sufficient for good generalization).

Deterministic input-output relationships: The inputs to the network contain sufficient information pertaining to the desired outputs. It is impossible to learn a non-existent function.

Smooth functions: A small change in the inputs should produce a small change in the outputs. Very non-smooth functions (e.g. random noise) cannot be generalized.

Sufficient training data: The used training data should be a sufficiently large and representative subset of the population. Sufficient data can avoid extrapolation.

March, 2009


81

Overfitting and Underfitting

A system that is not sufficiently complex (i.e., parameters to be tuned are less than required) may fail to detect fully the signal in a complicated data set, leading to underfitting.

A network that is too complex may fit not only the signal but also the noise, leading to overfitting.

Note that overfitting may occur even with noise-free data.

There are various approaches proposed in the literature jittering, weight decay, early stooping, Bayesian learning, robust learning algorithms, etc. .

March, 2009


82

Local Learning Concept

The minimum disturbance principle suggests that a better way of learning should be aimed at not only reducing the output error for the current training pattern but also minimizing disturbance to the weights having already learned.

A learning system following the minimum disturbance principle can learns more effective. We refer it as the local learning concept.

March, 2009


83


The updating effects of neural networks are prevailed to all weights in the networks due to the distributed knowledge representation. It violates the minimum disturbance principle. It is called the global learning.

Local learning can be more effective, but may not always learn better.

Neural fuzzy systems use spatial relations to define learning structure that can facilitate local learning concept.

March, 2009


84

Network Structure for Fuzzy Systems

In this kind of approach, fuzzy models are characterized by a set of parameters, such as the centers and widths in membership functions, the rule relationships, etc.

Since those parameters can be viewed as the weights in a network, the traditional learning schemes for neural networks then can be adopted to this fuzzy modeling problem.

Those kinds of approaches are often referred as neural fuzzy systems or neural-network-based fuzzy systems.

March, 2009


85


It can be found that neural fuzzy systems can always have better learning capability than that of neural networks.

Since local learning may restrain the learning on the pre-defined relations to reduce the learning burden, if those relations are not correct or cannot reflect certain information, the effects on local learning may not be acceptable.

Several systems can be classified as local learning systems, such as radial basis function networks, Wavelet networks, CMAC, etc.

March, 2009


86

Introductions

Fuzzy Systems

Neural Networks

Genetic Algorithms Optimization in Computational Intelligence Evolutionary Computation Other Non-derivation Optimization

Epilogue

Outline

March, 2009


87

Optimization in Computational Intelligence

Optimization processes are required in an intelligent system due to:

Better selection of applicable knowledge or strategies can result in better performance;

In the learning process, an optimal way of defining the updating rule is required.

In general, an optimization problem requires finding a setting of variable vector of the system such that a certain quality criterion or called a performance function is optimized. Sometimes, the variable vector may have to satisfy some constraints.

March, 2009


88

Optimization in Computational Intelligence

The traditional optimization approaches are to develop a formal model that resembles the original function and then solves it by means of traditional mathematical methods .

Evolutionary algorithms have been widely used in various intelligent systems. In fact, by combining with fuzzy systems and networks, lots of applications can be found in the literature.

March, 2009


89

Evolutionary Computation

An important property of evolutionary algorithms in search is that in the search process, auxiliary forms of the fitness function, such as derivations, are not required.

In fact, evolutionary computation should be understood as a general adaptable concept for problem solving rather than a collection of related and ready-to-use algorithms.

March, 2009


90

Introductions

Fuzzy Systems

Neural Networks


Epilogue

Outline

March, 2009


91


The majority of current implementations of evolutionary algorithms descend from three strongly related but independent developed approaches:

Genetic algorithms: to use binary as gene in its representation to search for an optimal chromosome.

Evolutionary programming: to evolve finite state machines to predict events on the basis of former observations.

Evolution strategies: to solve difficult discrete and continuous parameter optimization problems.

March, 2009


92


Evolutionary computation is to mimic the natural selection process so as to find the best fitted candidate for the solution. (Optimization)

Evolutionary algorithms can be viewed as optimization approaches that use random search algorithms with some guidance.

March, 2009


93


The guidance is fulfilled by a user-specified fitness function.

In general, an optimization problem requires finding a setting of variable vector of the system such that a certain quality criterion or called a performance function is optimized. Sometimes, the variable vector may have to satisfy some constraints.

March, 2009


94

Apply reproduction and crossover on P(t) to yield C(t)

Apply mutation on C(t) to yield and then evaluate D(t)

Select P(t+1) from P(t) and D(t) based on the fitness

Initialize population P(t)

Evaluate P(t)

Stop criterion satisfied ?

Stop

March, 2009


95


Evolutionary computation uses three basic operators to manipulate the genetic composition (chromosomes) of a population:

Reproduction is a process of selecting parents for generating offspring. The most highly rated chromosomes in the current generation are most likely copied in the new generation.

Crossover provides a mechanism for chromosomes to mix and match attributes through random processes.

Mutation is to changed attributes (genes) in the new generation to bring new possibility. Mutation is a very important mechanism in avoiding local minimum in optimization search.

March, 2009


96


The above operations play the role of generating the new chromosomes for evolution. Hopefully, the best-fitted solution can be generated.

Besides, randomness play the essential roles in those operations.

One attractive property of evolutionary algorithms is that the performance of the solution is always getting better.

March, 2009


97


However, due to the nature of adaptation to the problems, the operations of evolutionary algorithms must be designed by the users.

Moreover, if the optimization is constrained, the initial population and the generations of new chromosomes must be carefully selected.

March, 2009


98

Introductions

Fuzzy Systems

Neural Networks


Epilogue

Outline

March, 2009


99

Other Non-derivation Optimization

Other often mentioned approaches are Ants (ACS, ACO, etc) and Particle Swarm Optimization (PSO).

The overall ideas are all similar in that they all use fitness values to guide the search with some random mechanisms associated with the search process.

Usually, these approaches can have better search performance than that of genetic algorithms.

March, 2009


100

Other Non-derivation Optimization

It is because Genetic algorithms are solution-wise search and swarm search algorithms are component-wise search.

Also, it can be found that genetic algorithms are easier to be trapped into a local minimum if the initial population has some local optimum properties.

Swarm algorithms can easily escape from such an initial local optimum phenomena.

March, 2009


101

Epilogue

Computation intelligence is a new vehicle for the next generation of artificial intelligence.

Nevertheless, only computational intelligence can bring you nowhere.

To incorporate with other techniques may possibly create new frontiers for our dreams.

March, 2009


102

Thank you for your

attention!

Any Questions ?!

Shun-Feng Su,Professor of Department of Electrical Engineering,

National Taiwan University of Science and Technology

E-mail: [email protected],

march, 2009 ® copyright of shun-feng su 1 the essence of computational intelligence...

Documents