reinforcement learning in strategy selection for a coordinated multirobot system ieee transactions...

23
in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 37, NO. 6, NOVEMBER 2007 Kao-Shing Hwang, Member, IEEE, Yu-Jen Chen, and Ching-Huang Lee Advisor : Ming-Yuan Shieh Student : Ching-Chih Wen S/N : M9820108 PPT 製製製製 100% 1

Upload: melinda-ferguson

Post on 20-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

Reinforcement Learning in Strategy Selection for a

Coordinated Multirobot SystemIEEE TRANSACTIONS ON SYSTEMS, MAN, AND

CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 37, NO. 6, NOVEMBER 2007

Kao-Shing Hwang, Member, IEEE, Yu-Jen Chen,and Ching-Huang Lee

Advisor : Ming-Yuan Shieh

Student : Ching-Chih Wen

S/N : M9820108

PPT製作率︰ 100%

1

Page 2: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

Abstract Introduction SYSTEM FORMATION

Basic Behavior Role Assignment Strategies Learning System Dispatching System

EXPERIMENTS CONCLUSION

OUTLINE

2

Page 3: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

This correspondence presents a multi-strategy decision-making system for robot soccer games. Through reinforcement processes, the coordination between robots is learned in the course of game.

The responsibility of each player varies along with the change of the role in state transitions. Therefore, the system uses several strategies, such as offensive strategy, defensive strategy, and so on, for a variety of scenarios.

The major task assignment to robots in each strategy is simply to catch good positions.

Utilizing the Hungarian method, each robot can be assigned to its assigned spot with minimal cost.

ABSTRACT

3

Page 4: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

Reinforcement learning has attracted increasing interest in the fields of machine learning and artificial intelligence recently since it promises a way to use only reward and punishment in achieving a specific task [1].

Fig.1

INTRODUCTION(1/3)

4

Page 5: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

Traditional reinforcement-learning algorithms are often concerned with single-agent problems; however, no agent can act alone since it must interact with other agents in the environment to achieve a specific task [3].

Therefore, we here focus on high-level learning rather than the basic-behavior learning.

The main objective of this correspondence is to develop the reinforcement-learning architecture for multiple coordinate strategies in a robot soccer system.

INTRODUCTION(2/3)

5

Page 6: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

In this correspondence, we utilize the robot soccer system as our test platform since this system can fully implement a multi-agent system.

Fig.2

INTRODUCTION(3/3)

6

Page 7: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

Fig.3

SYSTEM FORMATION

7

Page 8: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

1) Go to a Position 2) Go to a Position With Avoidance 3) Kick a Ball to a Position

Fig.4 Fig.5

SYSTEM FORMATION-Basic Behavior

8

Page 9: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

1) Attacker position

Fig.6 Fig.7

Fig.8

SYSTEM FORMATION-Role Assignment(1/3)

9

Page 10: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

2) Sidekick position

Fig.9 Fig.10

3) Backup position

4) Defender position

SYSTEM FORMATION-Role Assignment(2/3)

10

Page 11: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

5) Goalkeeper position

Fig.11

SYSTEM FORMATION-Role Assignment(3/3)

11

Page 12: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

1) Primary part:

The attacker’s weighting is . 2) Offensive part:

The weighting of sidekick and backup are and , respectively.

3) Defensive part:

The weighting of defender and goalkeeper are and , respectively.

SYSTEM FORMATION- STRATEGIES(1/2)

aW

sW bW

dW gW

12

Page 13: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

According to the different weightings, different strategies can be developed. We can develop three strategies as follows:

1) Normal strategy:

is an example used in our simulations.

2) Offensive strategy:

is an example used in our simulations.

3) Defensive strategy:

is an example used in our simulations.

SYSTEM FORMATION- STRATEGIES(2/2)

, , , , 1,1,1,1,1a s b d g a s b d gW W W W W W W W W W

> max , min , max , 2,1.5,1.5,1,1a s b s b d g a s b d gW W W and W W W W W W W W W

max , min , max , , , , , 2,1,1,1.5,1.5a d g d g s b a s b d gW W W and W W W W W W W W W

13

Page 14: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

Fig.12

SYSTEM FORMATION- LEARNING SYSTEM(1/3)

14

Page 15: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

1) States:

Fig.13 2) Actions :The actions of Q-learning are spontaneous decisions on

the strategies taken in each learning cycle. Each action is represented by a set of weights.

SYSTEM FORMATION- LEARNING SYSTEM(2/3)

15

Page 16: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

3) Reward Function:

— Gain a point: r = 1.

— Lose a point: r = −1.

—Others: r = 0. 4) Q-Learning:Based on the states, actions, and reward function, we

can fully implement the Q-learning method.

Here, the ε-greedy method is chosen as action selection policy, and the probability of exploration ε is 0.1. The learning rate α is 0.8, and the discount factor γ is 0.9.

SYSTEM FORMATION- LEARNING SYSTEM(3/3)

16

Page 17: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

First, we introduce the method to compute cost.

Since the cost of each robot reaching each target is known, we can compute the summation costs of all robots to their dispatching positions.

SYSTEM FORMATION- DISPATCHING SYSTEM

17

Page 18: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

Multiple Strategy Versus the Benchmark

Fig.14 Fig.15

EXPERIMENTS(1/4)

18

Page 19: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

Multiple Strategy Versus Each Fixed Strategy

Fig.16 Fig.17

EXPERIMENTS(2/4)

19

Page 20: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

Multiple Strategy Versus Defensive Strategy

Fig.18 Fig.19

EXPERIMENTS(3/4)

20

Page 21: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

Multiple Strategy Versus Normal Strategy

Fig.20 Fig.21

EXPERIMENTS(4/4)

21

Page 22: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

1) Hierarchical architecture: The system is designed hierarchically, from basic behaviors to strategies. In other vehicle-systems, the basic behaviors can also be utilized.

2) A general learning system platform: If another strategy is designed, it can easily be added into our learning system without much alteration. Through the learning process, we can map the state to the best strategy.

3) Dynamic and quick role assignment: In this system, the role of each robot is changeable. We use the linear programming method to speed up our computation and to find the best dispatch under a strategy.

CONCLUSION

22

Page 23: Reinforcement Learning in Strategy Selection for a Coordinated Multirobot System IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND

[1] F. Ivancic, “Reinforcement learning in multiagent systems using game theory concepts,” Univ. Pennsylvania, Philadelphia, Mar. 2001. Tech. Rep. [Online]. Available: http://citeseer.ist.psu.edu/531873.html

[2] V. R. Konda and J. N. Tsitsiklis, “On actor-critic algorithms,” SIAM J. ControlOptim. , vol. 42, no. 4, pp. 1143–1166, 2003. [3] Y. Shoham, R. Powers, and T. Grenager, “On the agenda(s) of research on multi-agent learning,” in ArtificialMul tiagent

Learning: Papers From the 2004 Fall Symposium, S. Luke, Ed. Menlo Park, CA: AAAI Press, Tech. Rep. FS-04-02, 2004, pp. 89–95.

[4] M. Kaya and R. Alhajj, “Modular fuzzy-reinforcement learning approach with internal model capabilities for multiagent systems,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 2, pp. 1210–1223, Apr. 2004.

[5] M. C. Choy, D. Srinivasan, and R. L. Cheu, “Cooperative, hybrid agent architecture for real-time traffic signal control,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 33, no. 5, pp. 597–607, Sep. 2003.

[6] K. S. Hwang, S. W. Tan, and C. C. Chen, “Cooperative strategy based on adaptive-learning for robot soccer systems,” IEEE Trans. Fuzzy Syst., vol. 12, no. 4, pp. 569–576, Aug. 2004.

[7] K. H. Park, Y. J. Kim, and J. H. Kim, “Modular Q-learning based multiagent cooperation for robot soccer,” Robot. Auton. Syst., vol. 35, no. 2, pp. 109–122, May 2001.

[8] H. P. Huang and C. C. Liang, “Strategy-based decision making of a soccer robot system using a real-time self-organizing fuzzy decision tree,” Fuzzy Sets Syst., vol. 127, no. 1, pp. 49–64, Apr. 2002.

[9] M. Asada and H. Kitano, “The RoboCup Challenge,” Robot. Autonom. Syst., vol. 29, no. 1, pp. 3–12, 1999. [10] F. S. Hillier and G. J. Lieberman, Introduction to Operations Research. Boston, MA: McGraw-Hill, 2001. [11] V. Chvatal, Linear Programming. San Francisco, CA: Freeman, 1983. [12] Accessed on 22th of March 2003. [Online]. Available: http://www.fira. net/soccer/simurosot/overview.html [13] C. H. Papadimitriou and K. Steiglitz, CombinatorialOptimization: Algorithms and Complexity. Englewood Cliffs, NJ:

Prentice-Hall, 1982. [14] K. S. Hwang, Y. J. Chen, and T. F. Lin, “Q-learning with FCMAC in multi-agent cooperation,” in Proc. Int. Symp.

NeuralNetw ., 2006, vol. 3971, pp. 599–602.

REFERENCES

23