06084048

8/13/2019 06084048

1/5

Learning by multiple human agents to perform a

cooperative control task

Won-jun HongDept. Brain and Cognitive Engineering

Korea UniversitySeoul, South Korea

[email protected]

Sung-Phil Kim*Dept. Brain and Cognitive Engineering

Korea UniversitySeoul, South Korea

[email protected]

AbstractLittle has been known about the process of learning in

a human population during continuous goal-directed tasks. Most

models of human-like decision-making based multi-agent systems

have been focused on learning by discrete feedback. Also, the

multi-agent systems in engineering seldom involve the cognitive

concept. In this study, we design a multi-agent system based task

in which a group of persons controls a common actuator (i.e. a

computer cursor) to achieve the task goal (i.e. target acquisition).The cursor movement provides human agents real-time visual

feedback. Each subject is given the control capability of a specific

cursor direction that is not informed a priori. The population

vector, which has been developed to relate a population of motor

neurons to arm direction, is used to drive the cursor movement.

Learning occurs in the group of human agents as each agent

becomes aware of their contributions (i.e. their controlling

directions) to achieving the task goal, leveraging only continuous

visual feedback of cursor position on the screen. Moreover, the

system level learning is exhibited even though some agents failed

to learn their true directions with high accuracy. The

experimental results show that the multiple human agents can

learn to gain the control of a cursor based on visual feedbacks. In

addition, we examine how the learning occurs in the individual

level and the collective level.

Keywords: cooperative control; multi-agnet learning; human

agents; on-line control; visual feedback

I. INTRODUCTIONMulti-agent system (MAS) is defined as a system in which

a number of agents attempt to perform tasks through theinteraction [1]. A MAS has two levels of decision process: theindividual level and the collective level [2]. At the individuallevel (i.e. the agent level), the decision process of each agent ismodeled. At the collective level (i.e. the system level), anintegrative model is built to include interactions,communication, and coordination among all the agents.

MAS has been used in a wide range of research areas,including robotics, reinforcement learning, complex systems,and evolutionary computation, to name a few. It has also beenused to model human decision-making processes [3-4]. In

particular, some studies have modeled human-like decision-making and cognitive agents such as BDI (belief-desire-intention) and CODAGE (Cognitive Decision AGEnt). Manyapplications derived from these decision-making models can befound in several economics researches (e.g. computationaleconomics [5] and electronic commerce [6]).

In this paper, we present the result from a study on thecooperative control of a human MAS. In our experiment, a

population of human agents learns to control a computer cursorto acquire the targets. The cursor motion is governed by a 2D

population vector that linearly combines the direction controloutputs from individual agents [7]. Each agent is randomlyassigned a single direction prior to the task period and controls

the amount of directional input to forming the populationvector at every time instant, where no information about the

pre-assigned direction is given to the agents. During this targetacquisition task, the human agents are provided with a real-time visual feedback in the form of the cursor movement. Weinvestigate whether or not the human agents can learn to exertcooperative control only by the visual feedbacks of task outputand targets. If so, we seek to find an underlying mechanismfrom the behavior of individual human agents for learning theircontributions to cooperative control by a goal-directed task.

To build a quantitative model for summarizing thecontributions of individuals in a population to cursor motioncontrol, we modify the population vector algorithm that wasdeveloped to model movement direction control in the motor

cortical neuronal populations [7]. With this algorithm, aninstantaneous cursor velocity is determined by the vector sumof a set of 2D directional vectors where each directional vector,assigned to each participant, is scaled by the participantsonline input.

II. METHODSA. Population vector

We adopted the concept of the population vector algorithm(PVA) [1], that has been used for neural decoding, to generatea control signal for a computer cursor. As it is assumed in theoriginal PVA that each neuron codes its own preferreddirection, we assigned a specific direction to each agent of the

human MAS, represented by a direction vector ci for the i-thagent. These directions were randomly generated from auniform distribution of the angles between 0-360. Then, thecursor velocity at time instant t was determined using the

population vector, p(t), where p(t) was a vector sum of theweighted direction vectors of every agent:

978-1-4577-0653-0/11/$26.00 2011 IEEE 2468

8/13/2019 06084048

2/5

Figure 1. An illustration of the client program. The red dot is the moving

cursor and the yellow circle is the target. On each trial, the cursor is

initially located at the center and one of the blue circles randomly turns to

yellow. The subjects control their input values within -100 and +100 by

adjusting the slider and have the cursor reach the target. The cursor

velocity is determined by the population vector algorithm.

Figure 2. The average time taken to acquire the target over a series of

task sets.

=

=N

i

ii ctwtp1

)()( . (1)

The weight on the i-th agent, wi(t), was the input value bythe agent at time tand varied between -100 and 100.

B. Online cursor control by a human populationTo explore cooperative learning in the human MAS, we

designed an online computer cursor control task. The goal ofthe task was to move a cursor to hit a target on the computerscreen by the cooperative control of the agents. A similarconcept of cooperative control in this type of environment can

be found in the block pushing problem [8].

We created an online program to run this cooperative cursorcontrol task. This program was composed of the server and theclient modules. The agents (participants) registered the

program as a client and the server module collected the client

information, generated the cursor movement, and provided thefeedback to each client. A cursor board and a slide barconstituted the client program (Figure 1). The targets andcursor were displayed on the cursor board. Red dot is the cursorand yellow one is the target. On each trial, the cursor is initiallylocated at the center and one of the blue dot randomly becomesthe yellow target. All of the client programs shared the cursor

board on the top of program. Each agent was assigned a controldirection without knowledge of what the direction value was,and generated a input value by moving the slide bar to the left(negative values) or to the right (positive values). Movement ofthe cursor follows the population vector algorithm.

The real-time communication obeyed the Server-Client

model. The input from each agent, generated by the slide barvalue, was sent to the server in an asynchronous fashion. Theserver parsed the input from each agent at every 0.25s andcomputed the population vector with the new or the previousinputs. Then, the server updated the cursor motion and sent

back the updated cursor position to each client.

C. Task DesignThe participants performed the online experiment with a

confederate. Prior to the experiment, the participants weregiven a brief summary of the task. They were aware that aspecific direction was set on them but the exact direction wasunknown. Also they were informed that communication amongsubjects was not allowed.

A trial of target acquisition by cooperative control wasachieved when the human MAS moved the cursor and held iton the target for more than one second. A trial failed if they

placed the cursor on the wrong target or could not complete thetask within the time limit (30s). Each of 8 targets washighlighted in a random order to designate the target in a singlesession of experiment. Each experiment consisted of five sets.

D. ParticipantsA group of five male subjects with the ages between 20 and

40 participated in the study. To minimize the influence of biasfrom those who had previously been involved in the pastexperiments, the subjects who had never participated in thestudy were only recruited.

III. RESULTSWe used the evaluation metrics developed in HCI to asses

the performance of non-keyboard devices.

A. Success rates and movement time

2469

8/13/2019 06084048

3/5

Figure 4. Movement error versus epochs. Movement error measures how

much the cursor deviates from the ideal straight path.

Figure 3. The final cursor positions at the end of all trials. Each dot in

different colors denotes the final position for each target. The circles in

the same color largely depicted the target location and size

The subject group acquired the target successfully in all thesets. The success rates ranged from 75% to 87%. The averagetime taken to acquire the target is shown in Figure 2. After thefirst set, movement time (i.e. target acquisition time) reducedsignificantly and became stabilized.

B. Distance of the cursor to targetWe investigated whether the trial failed due to the complete

loss of control or due to difficulty in accurately positioning thecursor within a time limit. When we looked into the final cursor

positions of all trials, most final positions in the failed trialswere nearby the target and did not end at arbitrary locations(see Figure 3). This supports that the subjects obtained anability to control the cursor in the intended direction.

TABLE I. Given directions and Estimated directions

Given direction Estimated direction Error

23 30 7

59 70 11

95 90 5

131 135 4

167 120 47

(unit: degree)

C. Movement ErrorMovement error measures how much the cursor deviated

from the ideal straight path connecting the center position tothe target. Similar to movement time, movement error also

began to reduce significantly after the first set.

D. Subject input values versus desired directionWe investigated how the input from each subject changed

through the task performance. The distributions of the subject

input values are represented in Figure 5 as a function of aninstantaneous desired direction. Here, the instantaneous desireddirection is defined as the direction from a current cursor

position to the target at each time instant. Largely, thedistributions of each subject appeared to be reorganized to

provide meaningful inputs except the first subject.

E. Prediction of Given DirectionsIn the post-interview, we asked the direction the subjects

considered they were given. Table 1 compares the givendirections and the estimated directions. It shows that mostsubjects estimated the given directions with a small error.

IV. CONCLUSIONOur experiment of the online cursor control task by a

population of humans revealed that multiple human agentscould learn to gain the cooperative control of cursor using thereal-time visual feedback. Movement time and movement errordecreased as the subjects became to learn controlling a cursor.These results show that the learning process in the human

population occurred in a collective way.

Interestingly, the distribution of the subject input valueswas dispersed over all directions while they estimated theirgiven directions with considerably small errors. This learningof an uninformed direction by each subject through the visualfeedback of cursor action is also evident in the performance ofcursor control by the subjects. Even though one subject missed

the direction and another subject rarely provided input, thegroup could control the cursor in all directions. This resultindicates that the collective capability to control shows anadaptive adjustment to error.

We will continue to investigate not only how the groupdecision changes with the change of the environment, but alsowhether competition between groups would have an impact onthe collective learning.

2470

8/13/2019 06084048

4/5

Figure 5. The spread of subjects input. The row means subject. And the column mearns Epoch. In the circle, the red line represents the given direction and

the black line denotes the learned direction of each subject. Each point consists of angle between target and cursor, and scallar value which is input by subject.

The blue point means that the subject input the positive value.

2471

8/13/2019 06084048

5/5

V. DISCUSSIONSIn the follow-up study, we will focus on how each subject

becomes aware of his or her contributions. To this end, we willconsider improve our client program to gather the subjectscontinuous input value more accurately. As shown in Figure 5,the subjects seemed to choose several discrete input values inthe positive and negative ranges although they could chooseany value in the continuous input space. This may suggest thatwe can simplify input with the discrete choices. We willexamine the effect of the number of subject on learning of boththe collective and individual (agent) levels.

In this study, we exhibit the learning process occur in amultiple human agents system by real-time visual feedback.We consider this learning process as multi-agent learning onaccount of that learning process progress as two separate levels.We will perform an in-depth study of how the collectivelearning is composed of individual agent learning processes.

ACKNOWLEDGMENT

This research was supported by the WCU (World Class

University) Program (R31-10008) and Basic Science ResearchProgram (R1009541) of the National Research Foundation(NRF) of Korea, funded by the Ministry of Education, Scienceand Technology.

REFERENCES

[1] L. Panait and S. Luke, Cooperative Multi-Agent Learning: The State ofthe Art, Autonomous Agents Multi-Agent Systems., vol. 11, no. 3, pp.387-434, November 2005.

[2] J. Kant and S. Thiriot, Modeling one Human Decision Maker with aMulti-Agent System: the CODAGE Approach., AAMAS06Proceedings of the fifth international joint conference on Autonomousagents and multiagent systems, May 2006.

[3] E. Norling, Folk Psychology for Human Modeling: Extending the BDIParadigm., AAMAS04 Proceedings of the third international joint

conference on Autonomous agent and multiagent system, July 2004.

[4] S. Parsons and M. Wooldridge, Game theory and decision theory inAgent-Based Systems., Autonomous Agents Multi-Agent Systems.,vol. 5, no. 3, pp. 234-254, September 2002.

[5] B. Lebaron, Agent-based computational finance: suggested readingsand early research., Journal of Economic Dynamics and Control, vol.24, no. 5-7, pp.679-702, 2002.

[6] F. Guttman, A. G. Moukas, and P. Maes, Agent-mediated electroniccommerce., The Knowledge Engineering Review, vol. 13, no. 2, pp.147-159, 2002.

[7] AP. Georgopoulos, AB. Schwartz, and RE. Kettner, Neuronalpopulation coding of movement direction, Science, vol. 233, Issue4771, pp. 14161419, September 1986.

[8] Sen, S., Sekaran, M., and Hale, J., Learning to coordinate withoutsharing information, In Proceedings of the Twelfth NationalConference on Artificial Intelligence, pp. 426-431, 1994.

2472

06084048

Documents