06084048
TRANSCRIPT
-
8/13/2019 06084048
1/5
Learning by multiple human agents to perform a
cooperative control task
Won-jun HongDept. Brain and Cognitive Engineering
Korea UniversitySeoul, South Korea
Sung-Phil Kim*Dept. Brain and Cognitive Engineering
Korea UniversitySeoul, South Korea
AbstractLittle has been known about the process of learning in
a human population during continuous goal-directed tasks. Most
models of human-like decision-making based multi-agent systems
have been focused on learning by discrete feedback. Also, the
multi-agent systems in engineering seldom involve the cognitive
concept. In this study, we design a multi-agent system based task
in which a group of persons controls a common actuator (i.e. a
computer cursor) to achieve the task goal (i.e. target acquisition).The cursor movement provides human agents real-time visual
feedback. Each subject is given the control capability of a specific
cursor direction that is not informed a priori. The population
vector, which has been developed to relate a population of motor
neurons to arm direction, is used to drive the cursor movement.
Learning occurs in the group of human agents as each agent
becomes aware of their contributions (i.e. their controlling
directions) to achieving the task goal, leveraging only continuous
visual feedback of cursor position on the screen. Moreover, the
system level learning is exhibited even though some agents failed
to learn their true directions with high accuracy. The
experimental results show that the multiple human agents can
learn to gain the control of a cursor based on visual feedbacks. In
addition, we examine how the learning occurs in the individual
level and the collective level.
Keywords: cooperative control; multi-agnet learning; human
agents; on-line control; visual feedback
I. INTRODUCTIONMulti-agent system (MAS) is defined as a system in which
a number of agents attempt to perform tasks through theinteraction [1]. A MAS has two levels of decision process: theindividual level and the collective level [2]. At the individuallevel (i.e. the agent level), the decision process of each agent ismodeled. At the collective level (i.e. the system level), anintegrative model is built to include interactions,communication, and coordination among all the agents.
MAS has been used in a wide range of research areas,including robotics, reinforcement learning, complex systems,and evolutionary computation, to name a few. It has also beenused to model human decision-making processes [3-4]. In
particular, some studies have modeled human-like decision-making and cognitive agents such as BDI (belief-desire-intention) and CODAGE (Cognitive Decision AGEnt). Manyapplications derived from these decision-making models can befound in several economics researches (e.g. computationaleconomics [5] and electronic commerce [6]).
In this paper, we present the result from a study on thecooperative control of a human MAS. In our experiment, a
population of human agents learns to control a computer cursorto acquire the targets. The cursor motion is governed by a 2D
population vector that linearly combines the direction controloutputs from individual agents [7]. Each agent is randomlyassigned a single direction prior to the task period and controls
the amount of directional input to forming the populationvector at every time instant, where no information about the
pre-assigned direction is given to the agents. During this targetacquisition task, the human agents are provided with a real-time visual feedback in the form of the cursor movement. Weinvestigate whether or not the human agents can learn to exertcooperative control only by the visual feedbacks of task outputand targets. If so, we seek to find an underlying mechanismfrom the behavior of individual human agents for learning theircontributions to cooperative control by a goal-directed task.
To build a quantitative model for summarizing thecontributions of individuals in a population to cursor motioncontrol, we modify the population vector algorithm that wasdeveloped to model movement direction control in the motor
cortical neuronal populations [7]. With this algorithm, aninstantaneous cursor velocity is determined by the vector sumof a set of 2D directional vectors where each directional vector,assigned to each participant, is scaled by the participantsonline input.
II. METHODSA. Population vector
We adopted the concept of the population vector algorithm(PVA) [1], that has been used for neural decoding, to generatea control signal for a computer cursor. As it is assumed in theoriginal PVA that each neuron codes its own preferreddirection, we assigned a specific direction to each agent of the
human MAS, represented by a direction vector ci for the i-thagent. These directions were randomly generated from auniform distribution of the angles between 0-360. Then, thecursor velocity at time instant t was determined using the
population vector, p(t), where p(t) was a vector sum of theweighted direction vectors of every agent:
978-1-4577-0653-0/11/$26.00 2011 IEEE 2468
-
8/13/2019 06084048
2/5
Figure 1. An illustration of the client program. The red dot is the moving
cursor and the yellow circle is the target. On each trial, the cursor is
initially located at the center and one of the blue circles randomly turns to
yellow. The subjects control their input values within -100 and +100 by
adjusting the slider and have the cursor reach the target. The cursor
velocity is determined by the population vector algorithm.
Figure 2. The average time taken to acquire the target over a series of
task sets.
=
=N
i
ii ctwtp1
)()( . (1)
The weight on the i-th agent, wi(t), was the input value bythe agent at time tand varied between -100 and 100.
B. Online cursor control by a human populationTo explore cooperative learning in the human MAS, we
designed an online computer cursor control task. The goal ofthe task was to move a cursor to hit a target on the computerscreen by the cooperative control of the agents. A similarconcept of cooperative control in this type of environment can
be found in the block pushing problem [8].
We created an online program to run this cooperative cursorcontrol task. This program was composed of the server and theclient modules. The agents (participants) registered the
program as a client and the server module collected the client
information, generated the cursor movement, and provided thefeedback to each client. A cursor board and a slide barconstituted the client program (Figure 1). The targets andcursor were displayed on the cursor board. Red dot is the cursorand yellow one is the target. On each trial, the cursor is initiallylocated at the center and one of the blue dot randomly becomesthe yellow target. All of the client programs shared the cursor
board on the top of program. Each agent was assigned a controldirection without knowledge of what the direction value was,and generated a input value by moving the slide bar to the left(negative values) or to the right (positive values). Movement ofthe cursor follows the population vector algorithm.
The real-time communication obeyed the Server-Client
model. The input from each agent, generated by the slide barvalue, was sent to the server in an asynchronous fashion. Theserver parsed the input from each agent at every 0.25s andcomputed the population vector with the new or the previousinputs. Then, the server updated the cursor motion and sent
back the updated cursor position to each client.
C. Task DesignThe participants performed the online experiment with a
confederate. Prior to the experiment, the participants weregiven a brief summary of the task. They were aware that aspecific direction was set on them but the exact direction wasunknown. Also they were informed that communication amongsubjects was not allowed.
A trial of target acquisition by cooperative control wasachieved when the human MAS moved the cursor and held iton the target for more than one second. A trial failed if they
placed the cursor on the wrong target or could not complete thetask within the time limit (30s). Each of 8 targets washighlighted in a random order to designate the target in a singlesession of experiment. Each experiment consisted of five sets.
D. ParticipantsA group of five male subjects with the ages between 20 and
40 participated in the study. To minimize the influence of biasfrom those who had previously been involved in the pastexperiments, the subjects who had never participated in thestudy were only recruited.
III. RESULTSWe used the evaluation metrics developed in HCI to asses
the performance of non-keyboard devices.
A. Success rates and movement time
2469
-
8/13/2019 06084048
3/5
Figure 4. Movement error versus epochs. Movement error measures how
much the cursor deviates from the ideal straight path.
Figure 3. The final cursor positions at the end of all trials. Each dot in
different colors denotes the final position for each target. The circles in
the same color largely depicted the target location and size
The subject group acquired the target successfully in all thesets. The success rates ranged from 75% to 87%. The averagetime taken to acquire the target is shown in Figure 2. After thefirst set, movement time (i.e. target acquisition time) reducedsignificantly and became stabilized.
B. Distance of the cursor to targetWe investigated whether the trial failed due to the complete
loss of control or due to difficulty in accurately positioning thecursor within a time limit. When we looked into the final cursor
positions of all trials, most final positions in the failed trialswere nearby the target and did not end at arbitrary locations(see Figure 3). This supports that the subjects obtained anability to control the cursor in the intended direction.
TABLE I. Given directions and Estimated directions
Given direction Estimated direction Error
23 30 7
59 70 11
95 90 5
131 135 4
167 120 47
(unit: degree)
C. Movement ErrorMovement error measures how much the cursor deviated
from the ideal straight path connecting the center position tothe target. Similar to movement time, movement error also
began to reduce significantly after the first set.
D. Subject input values versus desired directionWe investigated how the input from each subject changed
through the task performance. The distributions of the subject
input values are represented in Figure 5 as a function of aninstantaneous desired direction. Here, the instantaneous desireddirection is defined as the direction from a current cursor
position to the target at each time instant. Largely, thedistributions of each subject appeared to be reorganized to
provide meaningful inputs except the first subject.
E. Prediction of Given DirectionsIn the post-interview, we asked the direction the subjects
considered they were given. Table 1 compares the givendirections and the estimated directions. It shows that mostsubjects estimated the given directions with a small error.
IV. CONCLUSIONOur experiment of the online cursor control task by a
population of humans revealed that multiple human agentscould learn to gain the cooperative control of cursor using thereal-time visual feedback. Movement time and movement errordecreased as the subjects became to learn controlling a cursor.These results show that the learning process in the human
population occurred in a collective way.
Interestingly, the distribution of the subject input valueswas dispersed over all directions while they estimated theirgiven directions with considerably small errors. This learningof an uninformed direction by each subject through the visualfeedback of cursor action is also evident in the performance ofcursor control by the subjects. Even though one subject missed
the direction and another subject rarely provided input, thegroup could control the cursor in all directions. This resultindicates that the collective capability to control shows anadaptive adjustment to error.
We will continue to investigate not only how the groupdecision changes with the change of the environment, but alsowhether competition between groups would have an impact onthe collective learning.
2470
-
8/13/2019 06084048
4/5
Figure 5. The spread of subjects input. The row means subject. And the column mearns Epoch. In the circle, the red line represents the given direction and
the black line denotes the learned direction of each subject. Each point consists of angle between target and cursor, and scallar value which is input by subject.
The blue point means that the subject input the positive value.
2471
-
8/13/2019 06084048
5/5
V. DISCUSSIONSIn the follow-up study, we will focus on how each subject
becomes aware of his or her contributions. To this end, we willconsider improve our client program to gather the subjectscontinuous input value more accurately. As shown in Figure 5,the subjects seemed to choose several discrete input values inthe positive and negative ranges although they could chooseany value in the continuous input space. This may suggest thatwe can simplify input with the discrete choices. We willexamine the effect of the number of subject on learning of boththe collective and individual (agent) levels.
In this study, we exhibit the learning process occur in amultiple human agents system by real-time visual feedback.We consider this learning process as multi-agent learning onaccount of that learning process progress as two separate levels.We will perform an in-depth study of how the collectivelearning is composed of individual agent learning processes.
ACKNOWLEDGMENT
This research was supported by the WCU (World Class
University) Program (R31-10008) and Basic Science ResearchProgram (R1009541) of the National Research Foundation(NRF) of Korea, funded by the Ministry of Education, Scienceand Technology.
REFERENCES
[1] L. Panait and S. Luke, Cooperative Multi-Agent Learning: The State ofthe Art, Autonomous Agents Multi-Agent Systems., vol. 11, no. 3, pp.387-434, November 2005.
[2] J. Kant and S. Thiriot, Modeling one Human Decision Maker with aMulti-Agent System: the CODAGE Approach., AAMAS06Proceedings of the fifth international joint conference on Autonomousagents and multiagent systems, May 2006.
[3] E. Norling, Folk Psychology for Human Modeling: Extending the BDIParadigm., AAMAS04 Proceedings of the third international joint
conference on Autonomous agent and multiagent system, July 2004.
[4] S. Parsons and M. Wooldridge, Game theory and decision theory inAgent-Based Systems., Autonomous Agents Multi-Agent Systems.,vol. 5, no. 3, pp. 234-254, September 2002.
[5] B. Lebaron, Agent-based computational finance: suggested readingsand early research., Journal of Economic Dynamics and Control, vol.24, no. 5-7, pp.679-702, 2002.
[6] F. Guttman, A. G. Moukas, and P. Maes, Agent-mediated electroniccommerce., The Knowledge Engineering Review, vol. 13, no. 2, pp.147-159, 2002.
[7] AP. Georgopoulos, AB. Schwartz, and RE. Kettner, Neuronalpopulation coding of movement direction, Science, vol. 233, Issue4771, pp. 14161419, September 1986.
[8] Sen, S., Sekaran, M., and Hale, J., Learning to coordinate withoutsharing information, In Proceedings of the Twelfth NationalConference on Artificial Intelligence, pp. 426-431, 1994.
2472