march, 2009 ® copyright of shun-feng su 1 the essence of computational intelligence...
Post on 20-Dec-2015
225 views
TRANSCRIPT
March, 2009
®Copyright of Shun-Feng Su
1
The Essence of Computational Intelligence
計算型智慧的基本概念
Offered by 蘇順豐Shun-Feng Su,
E-mail: [email protected]
Department of Electrical Engineering,National Taiwan University of Science and Technology
March, 2009
®Copyright of Shun-Feng Su
2
Preface
People always dreams of having machines that can act like human.
Artificial Intelligence is to study what are those components that can facilitate such a dream.
Due to the nature of knowledge, traditional artificial intelligence use symbols to construct the conceptual world.
March, 2009
®Copyright of Shun-Feng Su
3
Preface
Symbolic artificial intelligence is very difficult to manipulate for a real world problem, especially, for implementing common sense knowledge.
Recently, computational intelligence (CI) is commonly used and has demonstrated good performance in various applications.
CI is named to distinguish itself from the traditional symbolic artificial intelligence in the property of easy manipulation with the use of numerical knowledge representation.
March, 2009
®Copyright of Shun-Feng Su
4
Preface
The following three methodologies are often considered as CI:
Fuzzy Systems, Neural Networks, and Genetic Algorithms (or referred to as
Evolutionary Computation. )
This talk is to provide fundamental concepts and ideas in those often mentioned techniques.
March, 2009
®Copyright of Shun-Feng Su
5
Basics for CI
CI is known to have the following characteristics [1]:
Numerical knowledge representation; Adaptability; Fault tolerance; Fast processing speed ; Error rate optimality.
[1] J. C. Bezdek, “what is computational intelligence?” Computational Intelligence: Imitating Life, J. M. Zurada, R. J. Marks II, and C. J. Robinson, Eds., New York: IEEE Press, pp. 1-12, 1994.
March, 2009
®Copyright of Shun-Feng Su
6
Basics for CI
Possible advantages of using CI are:
Efficiency; Robustness; Good generalization capability; Easy to use; Easy to incorporate problem domain heuristics; Superior performance in various applications.
March, 2009
®Copyright of Shun-Feng Su
7
Basics for CI
Possible advantages of using CI are:
Efficiency; Robustness; Good generalization capability; Easy to use; Easy to incorporate problem domain heuristics; Superior performance in various applications.
Generation capability is to have a fair chance to behave as required
for any input data.
March, 2009
®Copyright of Shun-Feng Su
8
Basics for CI
Possible problems encountered while using CI are:
Incomprehensive in knowledge;Lack of theoretical analysis tools, such as
stability, performance guarantee, etc.; Various subjective parameters required; Lack of benchmarks in performance evaluation.
May be disadvantages, but sometimes, may provide good means for applications.
March, 2009
®Copyright of Shun-Feng Su
9
Introductions
Fuzzy Systems Uncertainty and Its representation Fuzzy operations and Uncertainty Reasoning Fuzzy Logic Control
Neural Networks
Genetic Algorithms
Epilogue
Outline
March, 2009
®Copyright of Shun-Feng Su
10
Introduction of Fuzzy Systems
Fuzzy systems have been widely used in various applications.
In fact, the fundamental idea behind fuzzy systems is to include uncertainty in the process.
Such an inclusion provides extra information so that the systems can be more accurate.
In other words, fuzzy is vagueness by meaning, but can provides accurate due to this extra information.
March, 2009
®Copyright of Shun-Feng Su
11
Uncertainties in Intelligent systems
Uncertainties exists for the following reasons:
noise always exists in the environment;facts being true or events occurring may not be
certain;stored knowledge is incomplete or liable to change;exceptions are inevitable for any realistic
knowledge;simplifications are necessary to reduce the
complexity of the system;partitions of continuous variables for rule-based
knowledge results in fuzzy set concept.
March, 2009
®Copyright of Shun-Feng Su
12
Uncertainties in Intelligent systems
Traditional systems always use nominal values to reason and to make decision. But, to use more information may have more accurate decision making.
Thus, to act intelligently, those uncertainties cannot be ignored in the way of computing.
To incorporate uncertainties in the decision making process, the system must be capable of representing uncertainty and also be equipped with the capability of approximate reasoning.
March, 2009
®Copyright of Shun-Feng Su
13
Fuzzy Sets As A Representation for Uncertainty
The traditional sets are called classical sets or crisp sets.
In a crisp set, the membership belonging is crisp and can be described in a simple yes/no answer. That is, an element is either in the set or not in the set. The membership function of A is defined as
.,0
,,1)(
Axwhen
AxwhenxA
March, 2009
®Copyright of Shun-Feng Su
14
Fuzzy Sets As A Representation for Uncertainty
The range of the membership function, , of a fuzzy set A now is the interval [0,1] instead of only binary values {0,1}.
Example: Let a fuzzy set A represent the concept “real numbers that are close to 5” and the membership function for A is
2)5(101
1)(
xxA
March, 2009
®Copyright of Shun-Feng Su
15
Fuzzy Sets As A Representation for Uncertainty
For example,
When x = 62mph, M(x)=0.4667, F(x)=0.5333.
When x = 63mph, M(x)=0.5333, F(x)=0.4667.
When x = 69mph, M(x)=0.0667, F(x)=0.9333.
March, 2009
®Copyright of Shun-Feng Su
16
Uncertainty Representations
Two often used uncertainty representations: Fuzzy set and Probability. From the uncertainty concept per se viewpoint, those two uncertainties are two different types of uncertainty.
fuzzy set is to capture the idea of vagueness: To indicate the degree of uncertainty about what it is.
What is rain? What is fast?
probability is to capture the idea of ambiguity: To indicate uncertain about whether it is there.
Whether it rains? What the outcome of a die will be?
March, 2009
®Copyright of Shun-Feng Su
17
Fuzzy vs. Probability
From the mathematical representation viewpoint, they are comparable and possess different reasoning behaviors.
Reasoning with probabilities is mathematical sound but is difficult to manipulate due to no modularity.
Reasoning with fuzzy sets does not provide mathematical sound inference and is subjective, but it is easy to manipulate.
In fact, other types of uncertainty can be found in the literature.
March, 2009
®Copyright of Shun-Feng Su
18
Introductions
Fuzzy Systems Uncertainty and Its representation Fuzzy Operations and Uncertainty Reasoning Fuzzy Logic Control
Neural Networks
Genetic Algorithms
Epilogue
Outline
March, 2009
®Copyright of Shun-Feng Su
19
Operations of Fuzzy–Extension Principle
Given a function f : U V, now the input domain is a fuzzy set A in U. What will be the output?
The extension principle states that the fuzzy degree for A will be the fuzzy degree for y=f(A).
The concept is to pass the membership degree of x to f(x); i.e. the function itself is crisp and will not introduce any uncertainty. Thus, the membership degree of x will truly appear for f(x).
March, 2009
®Copyright of Shun-Feng Su
20
Extension Principle
Two problems arising:
f(x) is a many-to-one function: i.e. f(x1)=f(x2), but x1≠x2. Then, the membership degree can be μ(x1) or μ(x2). In other words, the resultant membership degree is μ(x1)μ(x2) .
The input domain consists of multiple variables. Then, f(x1, x2, …, xn) is obtained when all x1, x2, …, xn appear. In other words, the membership degree of is μ(x1)μ(x2) … μ(xn).
March, 2009
®Copyright of Shun-Feng Su
21
Extension Principle
The extension principle allows the generalization of crisp mathematical concept to the fuzzy set framework, and extends point to point mapping to mapping for fuzzy sets.
It provides a means for any function f that maps an n-tuple (x1, x2, … ,xn) in the crisp set U to a point in the crisp set V to be generalized to mapping n fuzzy subsets in U to a fuzzy subset in V.
Any mathematical relationship between non-fuzzy elements can be extended to deal with fuzzy entities.
March, 2009
®Copyright of Shun-Feng Su
22
Classic Logic Reasoning
Logic reasoning is to find other true propositions (facts) from given true propositions (knowledge and/or facts).
The scenario of logic reasoning can be interpreted as: There is a knowledge base containing facts or rules. Now, a new piece of information or the description of the current situation is specified. Then, we want to find out what the system can conclude or which action should be taken under current circumstance.
The traditional reasoning is called the Modus Ponen as (A(AB))B. That is one knowledge AB and a fact A can result in the fact B.
March, 2009
®Copyright of Shun-Feng Su
23
Approximate Reasoning for Fuzzy sets
The most used inference rule is (A1(A2B))B. In the classic logic, either A1=A2 or A1A2. Therefore, with the match and fire property, either B is concluded or B is not concluded.
But, with the use of fuzzy sets, either A1 or A2 is a fuzzy set or both. Then what can the reasoning process conclude?
Example: (speed=95Km/hr)(speed is too fastPull back the throttle)) Whether the throttle should be pulled back?
In the most common cases, A2 is a fuzzy set and A1 is a fuzzy singleton (crisp value). Note that B can be a crisp value or a fuzzy set. However, the rule A2B is hardly fuzzy.
March, 2009
®Copyright of Shun-Feng Su
24
Approximate Reasoning for Fuzzy sets
The most used reasoning format is one of the categorical reasoning called the compositional rule of inference or the generalized modus ponens.
(X is A) and (IF (X is B) then (Y is C)) results in Y is AR.
where X and Y are fuzzy variables and A, B and C are fuzzy labels (sets). Note that the resultant AR for Y is a fuzzy set. Usually, the membership function of AR can be computed as
uRA v max)( min( ( ),A u ))(),(( vut CB
March, 2009
®Copyright of Shun-Feng Su
25
Approximate Reasoning for Fuzzy sets
The above result can be viewed as the extension principle.
is to find whether v is in Y is the selection among various u (or operation max), the existence of x=u in A and (and operation min) the relation of x=u and y=v.
uRA v max)( min( ( ),A u ))(),(( vut CB
March, 2009
®Copyright of Shun-Feng Su
26
Approximate Reasoning for Fuzzy sets
Note that from the logic viewpoint, the implication pq is equivalent to pq. However, this equivalence states that the logic of pq equals pq. But in the reasoning, the logic of the implication is assumed to be true, and the question is whether the current situations (x=u and y=v) match the rule IF (X is B) then (Y is C).
Therefore, the most commonly used relation is to compute the t-norm of and .)(uB )(vC
March, 2009
®Copyright of Shun-Feng Su
27
Approximate Reasoning for Fuzzy sets
Example: R1: IF (X is A1) and (Y is B1) then (Z is C1). R2: IF (X is A2) and (Y is B2) then (Z is C2).
Now, the input is (x0, y0) and the reasoning can be graphically shown as:
March, 2009
®Copyright of Shun-Feng Su
28
Introductions
Fuzzy Systems Uncertainty and Its representation Fuzzy Operations and Uncertainty Reasoning Fuzzy Logic Control
Neural Networks
Genetic Algorithms
Epilogue
Outline
March, 2009
®Copyright of Shun-Feng Su
29
Fuzzy Logic Control
A Fuzzy Logic Controller (FLC) is a controller described by a collection of fuzzy rules (e.g. IF-THEN rules) involving linguistic variables.
The original idea for the use of fuzzy control is to incorporate “expert experience” of human into the design of controllers.
The utilization of linguistic variables, fuzzy control rules and approximate reasoning provides a means to incorporate human expert experience in designing the controller.
March, 2009
®Copyright of Shun-Feng Su
30
Rationale behind Fuzzy Logic Control
In an FLC, the rule structure provides the adaptation among strategies, and then the fuzzy mechanism provides the interpreting capability among rules.
With the interpreting capability, the transition between rules is gradual rather than abrupt. It is the so-called softening process.
But, in recent development, fuzzy control is used because it consists of multiple strategies (rules or controllers) for different situations. It of course can have better control performance than that of one complicated controller.
March, 2009
®Copyright of Shun-Feng Su
31
Basic Structure of Fuzzy Logic Control
A typical architecture of an FLC consists four principal components: a fuzzifier, a fuzzy rule base, an inference engine, and a defuzzifier.
March, 2009
®Copyright of Shun-Feng Su
32
Fuzzy Logic Control
• Knowledge usually is in a rule structure and rule structures need partition.
• Fuzzy control uses fuzzy partition.
March, 2009
®Copyright of Shun-Feng Su
33
Fuzzy Logic Control
With fuzzy partition
The consequences of all matched rules must be
transformed into actions.
To use fuzzy rules, the input values must be transferred into
fuzzy labels.
March, 2009
®Copyright of Shun-Feng Su
34
Fuzzy Logic Control
With fuzzy partition
The consequences of all matched rules must be
transformed into actions.
To use fuzzy rules, a value
must be defined into labels.
also referred to as a fuzzy system.
March, 2009
®Copyright of Shun-Feng Su
35
Basic Structure of Fuzzy Logic Control
The fuzzifier is to transform crisp measured data (e.g., speed=100Km/hr) into suitable linguistic labels (e.g. speed is too fast).
The fuzzy rule base stores the knowledge in rule forms about how to control the system to be controlled (e.g., IF “speed is too low” THEN “increase the throttle setting”).
March, 2009
®Copyright of Shun-Feng Su
36
Basic Structure of Fuzzy Logic Control
The inference engine is to infer desired control strategies from rules by performing approximate reasoning based on current states.
The defuzzifier is to yield a non-fuzzy action or decision from the inferred control strategy (a fuzzy set) by the inference engine.
March, 2009
®Copyright of Shun-Feng Su
37
Fuzzy Systems
Mamdani fuzzy rules :
If (X is A) and (Y is B) … then (Z is C)
Note that C is a fuzzy set.
TSK (in modeling) or TS (in control) fuzzy rules :
If (X is A) and (Y is B) … then Z=f(X,Y).
Now, f() is a crisp function.
March, 2009
®Copyright of Shun-Feng Su
38
Fuzzy Systems
Mamdani fuzzy rules : If (X is A) and (Y is B) … then (Z is C)
TSK (in modeling) or TS (in control) fuzzy rules : If (X is A) and (Y is B) … then Z=f(X,Y).
The approximate reasoning for the output of a fuzzy rule is obtained from extension principle as:
min( , ).u
RA v max)( )(uA ))(),(( vut CB
March, 2009
®Copyright of Shun-Feng Su
39
Fuzzy Systems
Mamdani fuzzy rules : COA defuzzification
To find the center of the area, it need to use
numerical integration.
March, 2009
®Copyright of Shun-Feng Su
40
Fuzzy Systems
TS fuzzy rules : Somewhat is also called COA.
But without numerical integration. It is obtained as
z= ,
where and are the firing strength and the fired result for the i-th rule and m is the rule number.
m
ii
m
iii f
1
1
i if
Simple and easy to calculate. Most importantly, it can be used in any mathematical
operations, such as derivative.
March, 2009
®Copyright of Shun-Feng Su
41
Fuzzy Systems
Thus, it can be found that in recent development, most of approaches consider TS (or TSK) fuzzy models.
TS fuzzy models have also another advantage in applications. The output of a TS fuzzy model system can be more sensitive to the changes of the inputs.
It can eliminate the chattering effects in the final control stage occurring in the use of traditional fuzzy models (Mamdani fuzzy rules).
March, 2009
®Copyright of Shun-Feng Su
42
Fuzzy System
A fuzzy approximator is constructed by a set of fuzzy rules as
Generally, is a fuzzy singleton.
In the literature, this fuzzy model can be said to be a Mamdani fuzzy model (with singleton fuzzy sets) and a TS fuzzy model (a crisp function).
Ml
yAxAxR lF
lnn
ll
,,2,1for
, is THEN is and , and , is IF: 11
l
A commonly-used fuzzy model in control
March, 2009
®Copyright of Shun-Feng Su
43
Fuzzy System
A fuzzy approximator is constructed by a set of fuzzy rules as
Generally, is a fuzzy singleton (TS fuzzy model).
Ml
yAxAxR lF
lnn
ll
,,2,1for
, is THEN is and , and , is IF: 11
l
To me, due to no numerical integration needed, it is a TS fuzzy model. Also, no membership functions are used in the
consequences.
March, 2009
®Copyright of Shun-Feng Su
44
Fuzzy System
The fuzzy systems with the center-of area like defuzzification and product inference can be obtained as
It is a universal function approximator and is written as .
M
l
n
iiA
M
l
n
iiA
l
f
x
x
y
li
li
1 1
1 1
))((
))((
)(
x
ωθθx Tfy )(
t-norm operation for all premise
parts
March, 2009
®Copyright of Shun-Feng Su
45
Fuzzy System
The fuzzy systems with the center-of area like defuzzification and product inference can be obtained as
It is a universal function approximator and is written as .
M
l
n
iiA
M
l
n
iiA
l
f
x
x
y
li
li
1 1
1 1
))((
))((
)(
x
ωθθx Tfy )( This is what is used
in adaptive fuzzy control.
Simple and differentiable.
Note that is a function of states.
March, 2009
®Copyright of Shun-Feng Su
46
Fuzzy System
It should be noted that the above system is a nonlinear system. But, it can be seen that the form is virtually linear.
Thus, various approaches have been proposed to handle nonlinear systems by using the linear system techniques for the linear property bearing in each rule, such as common P stability, LMI design process, adaptive fuzzy control, etc.
March, 2009
®Copyright of Shun-Feng Su
47
Introductions
Fuzzy Systems
Neural Networks Machine Learning Neural Network Models Leaning Analysis
Genetic Algorithms
Epilogue
Outline
March, 2009
®Copyright of Shun-Feng Su
48
Why need Learning
The problem domain knowledge for the complicated system usually does not exist or is extremely difficult to obtain.
The system may be asked to learn knowledge from experience by itself.
Note that learning is an important capability for an intelligent system, but not necessary.
It can be seen in the recent research, most intelligent systems have been equipped with the learning capability.
March, 2009
®Copyright of Shun-Feng Su
49
What is Learning?
There are two important definitions for learning:
H. Simon defined learning as – “any change in a system that allows it to perform better the second time on the repetition of the same task or on another task drawn from the same population.”
B. Kosko defined learning as change in all cases. “A system learns if and only if the system parameter vector or matrix has a nonzero time derivative.”
March, 2009
®Copyright of Shun-Feng Su
50
Concept of Machine Learning
The first definition is to ask the system with learning should always behave better as learning continues.
The second definition is mainly for numerical learning.
The fundamental problem for learning is how to change the system to make the
system’s behaviors as required.
March, 2009
®Copyright of Shun-Feng Su
51
Concept of Machine Learning
The first definition is to ask the system with learning should always behave better as learning continues.
The second definition is mainly for numerical learning.
The fundamental problem for learning is how to change the system to make the
system’s behaviors as required.
so-called learning algorithms
March, 2009
®Copyright of Shun-Feng Su
52
Symbolic Learning vs. Numerical Learning
In a symbolic learning scheme, the representation of knowledge is symbolic, such as the predicate calculus and rules. The learning behavior is to build a conceptual relationship between those symbols from learned examples.
In a numerical learning scheme, the knowledge somehow is coded into numerical data. The learning behavior is concerned about changing the values of parameters numerically.
March, 2009
®Copyright of Shun-Feng Su
53
Symbolic Learning
Examples of symbolic learning schemes: Inductive Learning, Case-based Learning, Explanation-based Learning, etc.
Symbolic learning is well suited to interact with human experts, but very sensitive to noise.
The major drawback of this learning is that the knowledge manipulation is very complicated.
Traditional artificial intelligence has been focused on symbolic learning. However, due to the difficulty in manipulation and sensitive to noise, symbolic learning actually did not provide any significant advances in the real-world applications.
March, 2009
®Copyright of Shun-Feng Su
54
Numerical Learning
Examples of numerical learning schemes: Neural Networks, Cerebellar Model Arithmetic Computer (CMAC), Fuzzy Modeling, etc.
Numerical learning is computational efficiency and insensitive to noise, but incomprehensible. It is easy to use but is difficult to incorporate expert knowledge.
Recently, due to the use of neural networks and fuzzy systems, numerical learning has drawn more attentions .
Applications of numerical learning schemes can be found in various disciples, such as artificial intelligence, computer science, control engineering, decision theory, expert systems, operation research, pattern recognition, and robotics.
March, 2009
®Copyright of Shun-Feng Su
55
Concept of Learning
Depending on what type of information used in determining how to change the system, learning schemes are usually categorized into three different kinds of learning;
supervised learning,
unsupervised learning and reinforcement learning. Learning category
Reinforcement learning sometimes, is also said to be supervised learning, but with less introductive supervising.
March, 2009
®Copyright of Shun-Feng Su
56
Concept of Learning
In fact, most successful learning approaches is of supervised learning due to its simplicity in the required task.
Unsupervised learning is used for finding common features or for clustering. (self-organizing)
Reinforcement learning is fantastical in ideas, but due to its intricacy in learning (such as delay reward, decoupling between two learning systems), more study must be conducted.
Learning category
March, 2009
®Copyright of Shun-Feng Su
57
Introductions
Fuzzy Systems
Neural Networks Machine Learning Neural Network Models Leaning Analysis
Genetic Algorithms
Epilogue
Outline
March, 2009
®Copyright of Shun-Feng Su
58
Introduction of Neural Networks
Artificial neural networks (ANN) or in simple, neural networks (NN) are systems that are inspired by modeling networks of biological neurons in the brain.
NN are a promising new generation of information processing systems that demonstrate the ability to learn, recall, and generalize from training patterns or data.
March, 2009
®Copyright of Shun-Feng Su
60
Introduction of Neural Networks
NN have a large number of highly interconnected processing elements (PE) or neurons that usually operate in parallel.
NN are good at tasks such as pattern matching and pattern classification, function approximation, optimization, vector quantization, and data clustering. However, traditional computers are faster in algorithmic computational tasks and precise arithmetic operations.
March, 2009
®Copyright of Shun-Feng Su
61
Introduction of Neural Networks
Since neural networks do not use a mathematical model of how a system’s output depends on its input (so-called model-free estimator), neural network architectures can be applied to a wide variety of problems.
Like Brains, neural networks recognize patterns we cannot define. This is the property of recognition without definition.
March, 2009
®Copyright of Shun-Feng Su
62
Introduction of Neural Networks
An NN is a parallel distributed information-processing structure with the following characteristics:
- It is a neurally inspired mathematical model. - It consists of a large number of highly
interconnected processing elements (neurons).
- Its connections (weights) hold the knowledge.
March, 2009
®Copyright of Shun-Feng Su
63
Introduction of Neural Networks
- A neuron can dynamically respond to its stimulus, and the response completely depends on its local information.
- It has the ability to learn, recall, and generalize from training data by assigning or adjusting the connection weights.
- Its collective behavior demonstrates the computational power, and no single neuron carries specific information (distributed representation property).
March, 2009
®Copyright of Shun-Feng Su
64
Basic Models of Neural Networks
Models of ANNs are specified by three basic entities:
1. Neuron Models: It describes how the neurons process the input and how the output is generated.
2. Connectivity: It defines how those neurons are interconnected.
3. Learning Algorithms: It defines how the connecting weights are updated to adjust the networks so as to behavior as required.
March, 2009
®Copyright of Shun-Feng Su
65
Basic Models of Neural Networks
The processing in a neuron is separated into two parts: input and output.
Associated with the input of a neuron is an integration function f, which serves to combine information, activation, or evidence from an external source or other neurons into a net-input to the neuron.
The most commonly used integration function is linear and written as:
for i=1, 2, … , n
where is the threshold of the i-th neuron.
if
m
jijiji xwnet
1
i
March, 2009
®Copyright of Shun-Feng Su
66
Basic Models of Neural Networks
The output function of a neuron is usually called the activation function in that the output of a neuron serves the role of activation of the meaning stored in the neuron.
March, 2009
®Copyright of Shun-Feng Su
68
Learning in Neural Networks
As we have mentioned, the basic characteristic of ANNs is that they have the capability of learning.
Iterative learning procedures are used for a variety of ANN architectures.
Learning in ANNs can be accomplished in several ways: establishment of connections between neurons; adjustment of the weight values on the links; adjustment of threshold values in neurons.
In fact, these processes can all be considered as the adjustment of weight values on the links.
March, 2009
®Copyright of Shun-Feng Su
69
Learning in Neural Networks
The backpropagation (BP) learning algorithm is usually applied for learning. Such networks are also referred as backpropagation networks.
The fundamental idea is that when a cost function
E(w) is defined such as E(w)= ,
then the updating algorithm is w=- E(w) or
w=- = .
Since the above process is to update the weights after all training patterns are taken into account, this is kind of learning is called the batch learning.
( ) ( ) 2
1
1( )
2
pk k
k
d y
w
jw
E
( )
( ) ( )
1
( )kp
k k
k j
yd y
w
March, 2009
®Copyright of Shun-Feng Su
70
Learning in Neural Networks
It can be found that when the batch learning is used, the errors of all training patterns are summed together and then the learning effects are for the summary of all training patterns. Thus, the learning cannot make adjustments for individual pattern and the resultant learning is usually unacceptable.
The other kind of learning is called the on-line learning or per-example learning. In this type of learning, these changes are made individually for each pattern; i.e.,
( )( ) ( )( )
kk k
j
yd y
w
March, 2009
®Copyright of Shun-Feng Su
71
Learning in Neural Networks
When we want an NN to perform some tasks, the NN is realized by finding an appropriate set of weights. In other words, the obtained weights are to capture what we want the NN to be or the knowledge.
The activation values of neurons represent the system at some time snap. Thus, they capture the transition state for some specific input set at a certain time spot.
From the information storage viewpoint, the weights of the links encode the so-called long-term memory and the activation states of neurons encode the short-term memory in the NN.
March, 2009
®Copyright of Shun-Feng Su
72
Introductions
Fuzzy Systems
Neural Networks Machine Learning Neural Network Models Leaning Analysis
Genetic Algorithms
Epilogue
Outline
March, 2009
®Copyright of Shun-Feng Su
73
Universal Approximator Theorem
Neural network with as few as one hidden layer using arbitrary squashing activation function and linear or polynomial integration function can approximate virtually any function of interest to any desired degree of accuracy, provided sufficiently many hidden neurons are available.
Any lack of success in applications may arise from inadequate learning, insufficient number of hidden neurons or lack of deterministic relationships between inputs and desired outputs.
The theorem only stated the existence of the ideal network, but does not provide any mechanism to find it.
March, 2009
®Copyright of Shun-Feng Su
74
Learning Performance Analysis
Two types of learning phases must be distinguished in the evaluation of learning performance, especially for offline learning schemes: the training phase and the testing phase.
In the training phase, the system is trained by the given training patterns. Thus, in the training phase, the system is under construction and the convergent behavior of the training is concerned.
For the training performance, the convergent behavior is concerned and it is simple to consider the learning histories (training errors vs. training iterations).
March, 2009
®Copyright of Shun-Feng Su
75
Learning Performance Analysis
The learning convergent behaviors usually are characterized by two properties: the convergent speed and the converged error (training error).
If the system is offline learning scheme, the convergent speed may not be a significant factor to be considered.
An issue for the converged errors is the learning may be stuck on local minima if iterative (incremental) learning algorithms are used.
March, 2009
®Copyright of Shun-Feng Su
76
Learning Performance Analysis
Even through the learning algorithms are the major factor in determining the convergent behavior, other factors, such as the system structure, the training data quality, etc., may also affect the training performance.
The learning performance of the training phase is to state how accurately the learned system can approximate the desired outputs for a given input in the training data set.
The purpose of learning is to obtain a system that after learning can somehow have a fair chance to behave as required for any input data or in short, to generalize.
March, 2009
®Copyright of Shun-Feng Su
77
Learning Performance Analysis
Thus, in the testing phase, the generalization capability is concerned; that is, whether the learned system can interpret those unlearned patterns well.
In the testing phase, the learned system is tested by another set of patterns, which are not used in the training phase in any way, to define the generalization errors.
The performance in this phase is usually referred to as the generalization capability.
March, 2009
®Copyright of Shun-Feng Su
78
Validation of Generalization
There are several methods for estimating generalization errors:
Split-sample validation: To randomly select part of the data as a test set, which must not used in any way during training. (The most commonly used one).
Cross-validation: To resample the training data set. In a k-fold cross-validation, the data is divided into k subsets with equal size. Then, the network is trained k times, each time leaving out one of those subsets, but using only the omitted subset to compute the error criterion. It is also called “leave-one-out” cross-validation.
Bootstrapping: Instead of repeating subsets of the data, sub-samples are randomly drawn from the data. It seems to work better than cross-validation.
March, 2009
®Copyright of Shun-Feng Su
79
Learning Performance Analysis
In general, there are two different types of generalization: interpolation and extrapolation.
Interpolation can often be done reliably, but extrapolation is notoriously unreliable.
Note that generalization is not always possible for various learning systems despite the assertions in the literature.
March, 2009
®Copyright of Shun-Feng Su
80
Learning Performance Analysis
There are three conditions that are typically necessary (although not sufficient for good generalization).
Deterministic input-output relationships: The inputs to the network contain sufficient information pertaining to the desired outputs. It is impossible to learn a non-existent function.
Smooth functions: A small change in the inputs should produce a small change in the outputs. Very non-smooth functions (e.g. random noise) cannot be generalized.
Sufficient training data: The used training data should be a sufficiently large and representative subset of the population. Sufficient data can avoid extrapolation.
March, 2009
®Copyright of Shun-Feng Su
81
Overfitting and Underfitting
A system that is not sufficiently complex (i.e., parameters to be tuned are less than required) may fail to detect fully the signal in a complicated data set, leading to underfitting.
A network that is too complex may fit not only the signal but also the noise, leading to overfitting.
Note that overfitting may occur even with noise-free data.
There are various approaches proposed in the literature jittering, weight decay, early stooping, Bayesian learning, robust learning algorithms, etc. .
March, 2009
®Copyright of Shun-Feng Su
82
Local Learning Concept
The minimum disturbance principle suggests that a better way of learning should be aimed at not only reducing the output error for the current training pattern but also minimizing disturbance to the weights having already learned.
A learning system following the minimum disturbance principle can learns more effective. We refer it as the local learning concept.
March, 2009
®Copyright of Shun-Feng Su
83
Local Learning Concept
The updating effects of neural networks are prevailed to all weights in the networks due to the distributed knowledge representation. It violates the minimum disturbance principle. It is called the global learning.
Local learning can be more effective, but may not always learn better.
Neural fuzzy systems use spatial relations to define learning structure that can facilitate local learning concept.
March, 2009
®Copyright of Shun-Feng Su
84
Network Structure for Fuzzy Systems
In this kind of approach, fuzzy models are characterized by a set of parameters, such as the centers and widths in membership functions, the rule relationships, etc.
Since those parameters can be viewed as the weights in a network, the traditional learning schemes for neural networks then can be adopted to this fuzzy modeling problem.
Those kinds of approaches are often referred as neural fuzzy systems or neural-network-based fuzzy systems.
March, 2009
®Copyright of Shun-Feng Su
85
Local Learning Concept
It can be found that neural fuzzy systems can always have better learning capability than that of neural networks.
Since local learning may restrain the learning on the pre-defined relations to reduce the learning burden, if those relations are not correct or cannot reflect certain information, the effects on local learning may not be acceptable.
Several systems can be classified as local learning systems, such as radial basis function networks, Wavelet networks, CMAC, etc.
March, 2009
®Copyright of Shun-Feng Su
86
Introductions
Fuzzy Systems
Neural Networks
Genetic Algorithms Optimization in Computational Intelligence Evolutionary Computation Other Non-derivation Optimization
Epilogue
Outline
March, 2009
®Copyright of Shun-Feng Su
87
Optimization in Computational Intelligence
Optimization processes are required in an intelligent system due to:
Better selection of applicable knowledge or strategies can result in better performance;
In the learning process, an optimal way of defining the updating rule is required.
In general, an optimization problem requires finding a setting of variable vector of the system such that a certain quality criterion or called a performance function is optimized. Sometimes, the variable vector may have to satisfy some constraints.
March, 2009
®Copyright of Shun-Feng Su
88
Optimization in Computational Intelligence
The traditional optimization approaches are to develop a formal model that resembles the original function and then solves it by means of traditional mathematical methods .
Evolutionary algorithms have been widely used in various intelligent systems. In fact, by combining with fuzzy systems and networks, lots of applications can be found in the literature.
March, 2009
®Copyright of Shun-Feng Su
89
Evolutionary Computation
An important property of evolutionary algorithms in search is that in the search process, auxiliary forms of the fitness function, such as derivations, are not required.
In fact, evolutionary computation should be understood as a general adaptable concept for problem solving rather than a collection of related and ready-to-use algorithms.
March, 2009
®Copyright of Shun-Feng Su
90
Introductions
Fuzzy Systems
Neural Networks
Genetic Algorithms Optimization in Computational Intelligence Evolutionary Computation Other Non-derivation Optimization
Epilogue
Outline
March, 2009
®Copyright of Shun-Feng Su
91
Evolutionary Computation
The majority of current implementations of evolutionary algorithms descend from three strongly related but independent developed approaches:
Genetic algorithms: to use binary as gene in its representation to search for an optimal chromosome.
Evolutionary programming: to evolve finite state machines to predict events on the basis of former observations.
Evolution strategies: to solve difficult discrete and continuous parameter optimization problems.
March, 2009
®Copyright of Shun-Feng Su
92
Evolutionary Computation
Evolutionary computation is to mimic the natural selection process so as to find the best fitted candidate for the solution. (Optimization)
Evolutionary algorithms can be viewed as optimization approaches that use random search algorithms with some guidance.
March, 2009
®Copyright of Shun-Feng Su
93
Evolutionary Computation
The guidance is fulfilled by a user-specified fitness function.
In general, an optimization problem requires finding a setting of variable vector of the system such that a certain quality criterion or called a performance function is optimized. Sometimes, the variable vector may have to satisfy some constraints.
March, 2009
®Copyright of Shun-Feng Su
94
Apply reproduction and crossover on P(t) to yield C(t)
Apply mutation on C(t) to yield and then evaluate D(t)
Select P(t+1) from P(t) and D(t) based on the fitness
Initialize population P(t)
Evaluate P(t)
Stop criterion satisfied ?
Stop
March, 2009
®Copyright of Shun-Feng Su
95
Evolutionary Computation
Evolutionary computation uses three basic operators to manipulate the genetic composition (chromosomes) of a population:
Reproduction is a process of selecting parents for generating offspring. The most highly rated chromosomes in the current generation are most likely copied in the new generation.
Crossover provides a mechanism for chromosomes to mix and match attributes through random processes.
Mutation is to changed attributes (genes) in the new generation to bring new possibility. Mutation is a very important mechanism in avoiding local minimum in optimization search.
March, 2009
®Copyright of Shun-Feng Su
96
Evolutionary Computation
The above operations play the role of generating the new chromosomes for evolution. Hopefully, the best-fitted solution can be generated.
Besides, randomness play the essential roles in those operations.
One attractive property of evolutionary algorithms is that the performance of the solution is always getting better.
March, 2009
®Copyright of Shun-Feng Su
97
Evolutionary Computation
However, due to the nature of adaptation to the problems, the operations of evolutionary algorithms must be designed by the users.
Moreover, if the optimization is constrained, the initial population and the generations of new chromosomes must be carefully selected.
March, 2009
®Copyright of Shun-Feng Su
98
Introductions
Fuzzy Systems
Neural Networks
Genetic Algorithms Optimization in Computational Intelligence Evolutionary Computation Other Non-derivation Optimization
Epilogue
Outline
March, 2009
®Copyright of Shun-Feng Su
99
Other Non-derivation Optimization
Other often mentioned approaches are Ants (ACS, ACO, etc) and Particle Swarm Optimization (PSO).
The overall ideas are all similar in that they all use fitness values to guide the search with some random mechanisms associated with the search process.
Usually, these approaches can have better search performance than that of genetic algorithms.
March, 2009
®Copyright of Shun-Feng Su
100
Other Non-derivation Optimization
It is because Genetic algorithms are solution-wise search and swarm search algorithms are component-wise search.
Also, it can be found that genetic algorithms are easier to be trapped into a local minimum if the initial population has some local optimum properties.
Swarm algorithms can easily escape from such an initial local optimum phenomena.
March, 2009
®Copyright of Shun-Feng Su
101
Epilogue
Computation intelligence is a new vehicle for the next generation of artificial intelligence.
Nevertheless, only computational intelligence can bring you nowhere.
To incorporate with other techniques may possibly create new frontiers for our dreams.
March, 2009
®Copyright of Shun-Feng Su
102
Thank you for your
attention!
Any Questions ?!
Shun-Feng Su,Professor of Department of Electrical Engineering,
National Taiwan University of Science and Technology
E-mail: [email protected],