1 اتوماتای یادگیر instructor : saeed shiry. مقدمه an automaton is a machine or...

1

یادگیر اتوماتای

Instructor : Saeed Shiry

مقدمه An automaton is a machine or control mechanism

designed to automatically follow a predetermined sequence of operations or respond to encoded instructions.

The concept of learning automaton grew out of a fusion of the work of psychologists in modeling observed behavior, the efforts of statisticians to model the choice of experiments based on past observations, the attempts of operation researchers to implement optimal strategies in the context of the two-armed bandit problem, and the endeavors of system theorists to make rational decisions in random environments

2

Stochastic Learning Automata–Reinforcement Learning

3

Stochastic Learning Automata–Reinforcement Learning

In classical control theory, the control of a process is based on complete knowledge of the process/system. The mathematical model is assumed to be known, and the inputs to the process are deterministic functions of time.

Later developments in control theory considered the uncertainties present in the system.

Stochastic control theory assumes that some of the characteristics of the uncertainties are known. However, all those assumptions on uncertainties and/or input functions may be insufficient to successfully control the system if changes.

It is then necessary to observe the process in operation and obtain further knowledge of the system, i.e., additional information must be acquired on-line since a priori assumptions are not sufficient.

One approach is to view these as problems in learning.

4

reinforcement learning A crucial advantage of reinforcement learning

compared to other learning approaches is that it requires no information about the environment except for the reinforcement signal .

A reinforcement learning system is slower than other approaches for most applications since every action needs to be tested a number of times for a satisfactory performance. Either the learning process must be much faster than

the environment changes, or the reinforcement learning must be combined with an adaptive forward model that anticipates the changes in the environment

5

applications of learning automata ُ�Some Recent applications of learning automata to

real life problems: control of absorption columns, Bioreactors, control of manufacturing plants, pattern recognition , graph partitioning , active vehicle suspension, path planning for manipulators , distributed fuzzy logic processor training , path planning and action selection for autonomous mobile robots.

6

learning paradigm The the learning automaton presents may be stated as follows: a finite number of actions can be performed in a random

environment. When a specific action is performed the environment provides a random

response which is either favorable or unfavorable. The objective in the design of the automaton is to determine how

the choice of the action at any stage should be guided by past actions and responses.

The important point to note is that the decisions must be made with very little knowledge concerning the “nature” of the environment. The uncertainty may be due to the fact that the output of the

environment is influenced by the actions of other agents unknown to the decision maker.

7

The automaton and the environment

8

The environment

The environment in which the automaton “lives” responds to the action of the automaton by producing a response, belonging to a set of allowable responses, which is probabilistically related to the automaton action.

The term environment is not easy to define in the context of learning automata. The definition encompasses a large class of unknown random media in which an automaton can operate.

9

The environment

Mathematically, an environment is represented by a triple: {, c, } represents a finite action/output set, represents a (binary) input/response set, and c is a set of penalty probabilities, where each

element ci corresponds to one action i of the set .

10

The environment The output (action) (n) of the automaton belongs to the set ,

and is applied to the environment at time t = n. The input (n) from the environment is an element of the set

and can take on one of the values 1 and 2. In the simplest case, the values bi are chosen to be 0 and 1,

1 is associated with failure/penalty response. The elements of c are defined as:

Prob(n) (n) c (i , ,...)

Therefore ci is the probability that the action i will result in a penalty input from the environment.

When the penalty probabilities ci are constant, the environment is called a stationary environment.

11

Models P-model

Models in which the input from the environment can take only one of two values, 0 or 1, are referred to as P-models. In this simplest case, the response value of 1 corresponds to an “unfavorable” (failure, penalty) response, while output of 0 means the action is “favorable”

Q-model A further generalization of the environment allows finite

response sets with more than two elements that may take finite number of values in an interval [a, b]. Such models are called Q-models.

S-model When the input from the environment is a continuous random

variable with possible values in an interval [a, b], the model is named S-model.

12

The automaton The automaton can be represented by a quintuple

{, , , F(•,•), H(•,•)} where:

is a set of internal states. At any instant n, the state (n) is an element of the finite

set = {1, 2,..., s} is a set of actions (or outputs of the automaton). The output or action of

an automaton an the instant n, denoted by (n), is an element of the finite set

= {1, 2,..., r} is a set of responses (or inputs from the environment). The input from

the environment (n) is an element of the set which could be either a finite

set or an infinite set, such as an interval on the real line: {1, 2 ,..., m} or {(a,b)}

13

The automaton F(•,•): x is a function that maps the current

state and input into the next state. F can be deterministic or stochastic: (n 1) F[(n),(n)] H(•,•): xis a function that maps the current

state and input into the current output. If the current output depends on only the current state, the

automaton is referred to as state-output automaton. In this case, the function H(•,•) is replaced by an output

function G(•): , which can be either deterministic or stochastic:

(n) G[(n)]

14

The Stochastic Automaton

In stochastic automaton at least one of the two mappings F and G is stochastic.

If the transition function F is stochastic, the elements fij

of F represent the probability that

the automaton moves from state i to state j

following an input :

15


For the mapping G, the definition is similar:

Since fij are probabilities, they lie in the closed interval [a, b]; and to conserve probability measure we must have:

16


17

Automaton and Its Performance Evaluation A learning automaton generates a sequence of

actions on the basis of its interaction with the environment.

If the automaton is “learning” in the process, its performance must be superior to “intuitive” methods.

To judge the performance of the automaton, we need to set up quantitative norms of behavior.

The quantitative basis for assessing the learning behavior is quite complex, even in the simplest P-model and stationary random environments.

To introduce the definitions for “norms of behavior”, we will consider this simplest case

18

Norms of Behavior If no prior information is available, there is no basis in

which the different actions i can be distinguished. In such a case, all action probabilities would be equal to a

“pure chance” situation. For an r-action automaton, the action probability vector

p(n) = Pr {(n) = i } is given by:

Such an automaton is called “pure chance automaton,” and will be used as the standard for comparison.

19

Norms of Behavior

Consider a stationary random environment with penalty probabilities

We define a quantity M(n) as the average penalty for a given action probability vector:

20

Norms of Behavior

For the pure-chance automaton, M(n) is a constant denoted by Mo:

Also note that:

i.e., E[M(n)] is the average input to the automaton.

21

Norms of Behavior

22

Variable Structure Automata

A more flexible learning automaton model can be created by considering more general stochastic systems in which the action probabilities (or the state transitions) are updated at every stage using a reinforcement scheme.

For simplicity, we assume that each state corresponds to one action, i.e., the automaton is a state-output automaton.

reinforcement scheme

A reinforcement scheme can be represented as follows:

where T1 and T2 are mappings.

Linear Reinforcement Schemes

general linear schemes

the parameter a is associated with reward response, and the parameter b with penalty response. If the learning parameters a and b are equal, the scheme is called the linear reward-penalty scheme LR-P

Linear Reinforcement Schemes by analyzing eigen values of the resulting difference

equation, it can be shown that asymptotic solution of the set of difference equations enables us to conclude:

Therefore, the multi-action automaton using the LR-P scheme is expedient for all initial action probabilities and in all stationary random environments.

Expediency Expediency is a relatively weak condition on the

learning behavior of a variable-structure automaton. An expedient automaton will do better than a pure

chance automaton, but it is not guaranteed to reach the optimal solution.

In order to obtain a better learning mechanism, the parameters of the linear reinforcement scheme are changed as follows: if the learning parameter b is set to 0, then the scheme is

named the linear reward-inaction scheme LR-I. This means that the action probabilities are updated in the

case of a reward response from the environment, but no penalties are assessed.

Interconnected Automata it is possible that there are more than one automata in an environment. If the interaction between different automata is provided by the

environment, the case of multi-automata is not different than a single automaton case.

The environment reacts to the actions of multiple automata, and the environment output is a result of the combined effect of actions chosen by all automata.

If there is direct interaction between the automata, such as the hierarchical (or sequential) automata models, the actions of some automata directly depend on the actions of others.

It is generally recognized that the potential of learning automata can be increased if specific rules for interconnections can be established.

Example: A Vehicle Control Since each vehicle’s planning layer will include two automata — one for lateral, the

other for longitudinal actions — the interdependence of these two sets of actions automatically results in an interconnected automata network.

Application of Learning Automata to Intelligent Vehicle Control

Designing a system that can safely control a vehicle’s actions while contributing to the optimal solution of the congestion problem is difficult

When the design of a vehicle capable of carrying out tasks such as vehicle following at high speeds, automatic lane tracking, and lane changing is complete, we must also have a control/decision structure that can intelligently make decisions in order to operate the vehicle in a safe way.

Vehicle Control

The aim here is to design an automata system that can learn the best possible action (or action pairs: one for lateral, one for longitudinal) based on the data received from on-board sensors.

The Model For our model, we assume that an intelligent vehicle is capable

of two sets of lateral and longitudinal actions. Lateral actions are shift-to-left-lane (SL), shift-to-right-lane (SR)

and stayin- lane (SiL). Longitudinal actions are accelerate (ACC), decelerate (DEC)

and keep-same-speed(SM). There are nine possible action pairs provided that speed

deviations during lane changes are allowed.

Sensors An autonomous vehicle must be able to ‘sense’ the environment around

itself. In the simplest case, it is to be equipped with at least one sensor

looking at the direction of possible vehicle moves. Furthermore, an autonomous vehicle must also have the knowledge of

the rate of its own displacement. Therefore, we assume that there are four different sensors on board the

vehicle: headway sensor, two side sensors, and a speed sensor. The headway sensor is a distance measuring device which returns the

headway distance to the object in front of the vehicle. An implementation of such a device is a laser radar.

Side sensors are assumed to be able to detect the presence of a vehicle traveling in the immediately adjacent lane. Their outputs are binary. Infrared or sonar detectors are currently used for this type of sensor.

The speed sensor is simply an encoder returning the current wheel speed of the vehicle.

Automata in a multi-teacher environment connected to the physical layers

Mapping

The mapping F from sensor module outputs to the input b of the automata can be a binary function (for a P-model environment), a linear combination of four teacher outputs, or a more complex function as is the case for this application.

An alternative and possibly more ideal model would use a linear combination of teacher outputs with adjustable weight factors (e.g., S-model environment).

buffer in regulation layer

The regulation layer is not expected to carry out the action chosen immediately. This is not even possible for lateral actions. To smooth the system output, the regulation layer carries out an action if it is recommended m times consecutively by the automaton, where m is a predefined parameter less than or equal to the number of iterations per second.

35

مراجع

Phd Thesis:

Unsal, Cem , Intelligent Navigation of Autonomous Vehicles in an Automated Highway System: Learning Methods and Interacting Vehicles Approach

http://scholar.lib.vt.edu/theses/available/etd-5414132139711101/

http://ceit.aut.ac.ir/~shiry/lecture/machine-learning/tutorial/LA/

36

Amirkabir Univerity - Machine learning Course

ر يادگي ياتوماتايسلول

Cellular Learning Automata


38

(CA)ي سلولي اتوماتا

اتوماتاي سلولي شبكه اي از سلولها ست که در يک توپولوژي مشخص قرار گرفته اند.

.براي هر سلول، سلولهاي ديگري به عنوان همسايه وجود دارند دو ويژگي همسايگي: •

yه سلول ي همساx( اگر سلول 2( هر سلول همسايه خودش است. 1 است.xه سلول ي هم همساyاست, سلول

براي هر سلول تعدادي متناهي حالت (state) وجود دارد که در هر لحظه هر ست. حالت هاسلول در يکي از اين


39

در CAشود که بر اساس حالت يف مي تعرين محلي از قواني مجموعه ا کند.ي آن سلول را مشخص ميگان سلول، حالت بعديهمسا

Neighborhood Rules

Next Step


40

ويژگي هاي اساسي اتوماتاي سلولي :عبارتند از

آ- فضايي گسسته دارند.ب- زمان بصورت گسسته پيش مي رود.

پ- حالتهKKاي کKKه سKKلولها مي تواننKKد دارا باشKKند متنKKاهي است.

ت- تمام سلولها يكسان ميباشند.ث- عمKل بKروز در آوردن سKلولها بصKورت همگKام مي باشد.

ج- قKوانين تصKادفي نبKوده و بطKور قطعي اعمKال مي شوند.چ- قKKKانون در هKKKر محKKKل فقKKKط بسKKKتگي بKKKه مقKKKادير

همسايه هاي آن دارد.


41

Game of Life: مثال

ن :يقوان

ماند.يه آن زنده باشند، زنده ميا سه همساي. هر سلول که دو 1

رد.ي ميت ميه زنده به خاطر ازدحام جمعيشتر همسايا بي. هر سلول با چهار 2

رد.ي مي مييه زنده از تنهايچ همسايا هيک ي.هر سلول با 3

شود.يه زنده داشته باشد متولد ميقا سه همساي. هر سلول مرده که دق4


42

ي اتوماتايتها و نقصهايقابليسلول

CAار ين بسيل شده است که قواني اجزاء مشابه و ساده تشکيکسري است که از ي مدل را مدل يده ايچي پيستمهاي تواند سيت ميز بر آنها حاکم است. اما در نهاي نيساده محل

کند.

ک اشکال عمده يCAک کاربرد خاص است و ي ياز براين مورد ني قوانين فرم قطعيي تع باشد.ي مناسب مي قطعيستمهاي مدل کردن سي براCAنکه يا

ن، با گذشت زمان ي قوانين فرم قطعيياز به تعيم که بدون ني باشيد به دنبال روشي پس بان مناسب استخراج شوند.يقوان

ن ي از ايکي به آنها يريادگيت ي و افزودن قابلCA يهوشمند کردن سلولهاروشهاست!


43

اتوماتاي يادگير سلولي

CLA ک يCA ک ي است که هر سلول آن بهLAباشد.ي مجهز م

اين مدل ازCAکه داراست.يريادگيت يل قابلي بهتر است, به دل بر و LAاز يرا مجموعه اي دارد زيز برتري ن LAتوانند با هم فعل و ي هاست که م

انفعال داشته باشند.يده اصلي ا CLAتها در ي انتقال وضعيم چگونگي تنظي برايريادگين است که از ي که ا

CAم.ي استفاده کن


44

تعريف رياضي اتوماتاي يادگير سلولي

است به طوريكه: بعدي يك چندتايي dاتوماتاي يادگير سلولي ),,,,( FNAZCLA d

ي]ك ش]بكه از d ت]ايي ه]اي م]رتب از اع]داد ص]حيح مي باش]د. اين شبك]ه مي توا]ند يك شبكه ]متن]اهي]، نيمه ]متناهي ي]ا نام]تناهي] باشد.

.يك مجموعه متناهي از حالتها مي باشد ي]ك مجموع]ه از اتوماتاه]اي ي]ادگير ، (LA) اس]ت ك]ه ه]ر اتومات]اي

يادگير به يك سلول نسبت داده مي شود. ، مي باش]د ك]ه ي]ك زي]ر مجموع]ه متن]اهي از

بردار همسايگي خوانده مي شود. ق]انون محلي -CLA مي باش]د ب]ه طوريك]ه مجموع]ه

مق]ادي]ري ا]س]ت ك]ه مي] تو]ان]د ب]ه عن]وان] س]يگ]نال ت]قوي]تي پذيرفت]ه شود.

dZ

},...,{ 1 mxxN dZ

mF :

A


45

ي اتوماتايکاربردهاير سلوليادگي

كاربردهاي پردازش تصويري )حذف نويز و تعيين لبه(

تخصيص منابع در شبكه هاي موبايل )تخصيص كانالو پذيرش كانال(

مدل سازي پديده ها )انتشار شايعه و بازار اقتصادي(

گريدهاي محاسباتي طراحي سيستمهاي چند عامله الگوريتمهاي بهينه سازي


46

در پردازش CLAکاربرد ريتصو

که هر ي شود بطوري نگاشت ميدوبعد ير سلوليادگي ير در اتوماتاي ابتدا تصو شود.ير داده ميادگي ي اتوماتاي از سلولهايکيکسل به يپ

ي سلولها و قانون محلي ممکن براي با توجه به کاربرد خاص مورد نظر، اقدامها شود.يمه و پاداش تعيين ميجر


47

ر با استفاده ي تصويص لبه هايمثال: تشخCLAاز

کسل به لبه ي. پ2کسل به لبه متعلق است. ي. پ1 باشد: ي دو اقدام مي هر اتوماتا داراست.يمتعلق ن

کند و ي خود را انتخاب مي از اقدامهايکي ي در ابتدا هر اتوماتا به صورت تصادف که يي کنند کمتر از تعداد اتوماتاهاي که اقدام اول را انتخاب مييالبته تعداد اتوماتاها

شود.ي کنند در نظر گرفته مياقدام دوم را انتخاب م

سه يگانش مقايت همسايت خود را با وضعي در هر مرحله از تکرار هر اتوماتا وضعن اساس که:يد. بر اي نمايح ميسه رفتار خود را تصحين مقاي کند و بر اساس ايم

لبه باشند، احتماال يک سلول رويه ي همساي اگر دو تا چهار اتوماتا• لبه است.ين سلول هم رويا لبه يک سلول رويه ي همسايشتر از چهار اتوماتايا بيک ي اگر •

ست.ي لبه نين سلول همروينباشند، احتماال ا


48

ن صورت:يبد

ييتا8 يگي همساي اقدام اول خود را انتخاب کند و تعداد اتوماتاهاCLAک سلول در ي اگر •ن دو تا چهار باشد، اقدام انتخاب شده مناسب يآن که همان اقدام را انتخاب کرده اند ب

رد.ي گيبوده و پاداش م

يگي همساي اقدام دوم خود را انتخاب کند و تعداد اتوماتاهاCLAک سلول در ي اگر •شتر از چهار باشد، اقدام انتخاب يا بيک ي آن که همان اقدام را انتخاب کرده اند ييتا8

رد.ي گيشده مناسب بوده و پاداش م

شود.يمه مي است که اقدام انتخاب شده نادرست بوده و جرين معنيگر به اي دي حالتها•

دار يت پايه اتوماتاها به وضعي که کليا تا زمانين و يات فوق را به تعداد معي عملم.ي کنيبرسند تکرار م


49

در CLA عملکرد روش استخراج لبه


50

منابع

H.Beigy and M.R.Meybodi. “A Mathematical Framework For Cellular Learning Automata”, Advanced in Complex Systems ,2004.

ي و کاربردهاير سلوليادگي ي, “اتوماتايع خوارزمي، محمد رفيبدي محمد رضا م .1382ز يير، پايرکبير”, دانشگاه اميآن در پردازش تصاو

1 اتوماتای یادگیر instructor : saeed shiry. مقدمه an automaton is a machine or...

Documents