[ieee 2011 international conference on it and multimedia (icim) - kuala lumpur, malaysia...

A Multi-Objective Neuro-Evolutionary Optimization Approach to Intelligent Game AI Synthesis

1Kar Bin Tan, 2Jason TeoEvolutionary Computing Laboratory,

School of Engineering and Information Technology, Universiti Malaysia Sabah,

Kota Kinabalu, Sabah, [email protected], [email protected]

Patricia AnthonyCenter of Excellence in Semantic Agents,

School of Engineering and Information Technology, Universiti Malaysia Sabah,

Kota Kinabalu, Sabah, [email protected]

Abstract—Numerous traditional board games such as Backgammon, Chess, Tic-Tac-Toc, Othello, Checkers, and Go have been used as research test-beds for assessing the performance of myriad computational intelligence systems including evolutionary algorithms (EAs) and artificial neural networks (ANNs). Approaches included building intelligent search algorithms to find the required solutions in such board games by searching through the solutions space stochastically. Recently, one particular type of search algorithm has been receiving a lot of interest in solving such kinds of game problems,which is the multi-objective evolutionary algorithms (MOEAs). Unlike single-objective optimization based search algorithms, MOEAs are able to find a set of non-dominated solutions which trades-off among all the conflicting objectives. In this study, the utilization of a multi-objective approach in evolving ANNs for Go game is investigated. A simple three layered feed-forward ANN is used and evolved with Pareto Archived Evolution Strategies (PAES) for computer players to learn and play the small board Go games.

Keywords-multi-objective evolutionary algorithms; artificial neural networks; pareto archived evolution strategies; multi-objective optimization; computer go

I. INTRODUCTION

Backgammon, Go/Weiqi, Monopoly, Chess, Ludo, Checkers and Tic-Tac-Toe are well-known traditional board games which are played in most societies today. Board gamesare strategy games which usually have two or more players. Players place their pieces on a pre-marked surface consequently. Pieces placed will be removed or moved in the middle game according to a set of rules. Most board games simulate a real life competitive dynamic environment and the players aim on beating opponents in terms of counters, territories obtained, winning position or accrual of points.

Due to surprisingly fast development of sciences and technology, computer games represent one of the most ideal and practical platform for studying evolutionary computational systems [17]. Zero-sum board games such as the game of Go, Othello, Checkers, Chess, and Tic-Tac-Toe provide a simple yet interesting test-bed to study both of the machine learning and the optimization aspects of soft computing systems. Many board games like Chess, Othello, Backgammon and Checkers

have already been solved and those computer playing systems even defeated the human world champion [2, 17].

Unfortunately, for the game of Go, a traditional board game, its solutions are not comparable with other board games. Although the rules for Go are very simple but only the small board size Go game has been partially solved in previous research [4, 13]. Due to the obscure and dynamic changes of its blocks form by the stones and large branching factors, Go is not suitable for implementation of deterministic search based algorithms [2, 4, 5, 14, 17]. Instead, current Go programming systems still not discover good evaluation functions for game board position and hence causes its progress to remain elusive [1, 2, 13].

Due to the ability of population-based stochastic search methods and apply the principle of survival of the fittest to explore better global optimal solutions which motivated and emulate from the process of nature evolution, MOEAs supervised machine learning to solve a wide range of difficultand real-world problems. Using ANNs evolve with meta-heuristics is not a new approach for machine learning to solve Go game problems [1, 8, 12, 13, 14, 16]. Although number of researchers implemented neural evolution mechanism for machine learning to play Go, most of them only optimize single objective. In common case, the performance of single objective optimization will outperform the multi-objective optimization. This is because it only optimizes the fitness of a solution rely on single objective while in multi-objective the Pareto solutions depend primarily on a ranking induced by the dominance relation. Based on literature review conducted, MOEA have not been applied yet in 7x7 board size Go gamesand multi-objective optimization is shown outperform single objective in 5x5 board Go game in previous research.Furthermore, 7x7 board size Go games have not been solved; hence the motivation for this investigation had been conducted. In this study, the PAES algorithm was utilized to generate a Pareto optimal set of ANNs that optimizes the conflicting objectives of maximizing the 7x7 board size Go game fitness scores and minimizing neural network complexity by reducing the hidden units.

The structure of this paper is organized as follows. Section II discusses the computer Go, GNU Go which used as the opponent of the neural Go player, along with its evaluation

Proceedings of the 5th International Conference on IT & Multimedia at UNITEN (ICIMU 2011) Malaysia

14 – 16 November 2011

978-1-4577-0989-0/11/$26.00 ©2011 IEEE

functions. Section III presents the PAES algorithms and fitness functions used. Section IV descripts the entire experimentalsetup. Section V provides the experiment’s results. Finally, Section VI discusses our future works.

II. GNU GO

Unlike most of the Go programs, GNU Go is free softwareand using hand-coded heuristic and pattern matching approach. It is one of the well-known Go program that have been available since March 1989. Its algorithms and source code are open and documented. It is programmed in C/C++ code and is able to play the Go games from board sizes of 5x5 to 19x19. The program is free for anyone to inspect or enhance. Besides,GNU Go can also be easily compile and execute in different operating system such as GNU/Linux, Windows and Mac systems. It also implemented Go Text Protocol (GTP) which allows it integrates or communicates with other computer Go program like GoGui, CGoban and gGo which provide the graphical user interface (GUI) for GNU Go as well. GNU Go can play the Go games in ASCII mode which will reduce the computational costs and fasten the playing process.

There are several phases in the move generation in GNU Go program. First phase, which also is the most important phase, where GNU Go gather the current Go board information. In this phase, it will identify the group of directly connected or disconnected stones on the board. Then, this information will be analyzes by a variety of pattern matching approach and candidate moves are proposed by move generator. The move generator does not set the value of the candidate moves but just assign reasons by enumerate justifications of each of them. Then, each move’s reasons are go through the detail valuation phase and corresponding characteristic values are assigned. Lastly, these values are evaluated and ranked according to a set of hand-coded rules and the move with highest value is played by GNU Go.

Currently, GNU Go still under the process of enhancement which still trained by playing thousand of games on several servers like No Name Go Server (NNGS), Legend Go server in Taiwan, WING server in Japan and KGS Go server. GNU Go also had taken part in several famous Go tournaments such as Computer Olympiad, Gifu Challenge, European Go Congress, 21st Century Cup and others. In 2006, GNU Go won the 19x19 Go section of the 11th Computer Olympiad.

III. PAES AND FITNESS FUNCTIONS

This study investigates a multi-objective problem which solves two objectives simultaneously: (1) maximize Go games playing fitness score (2) minimize the number of hidden units used in ANNs. ANN is evolved with (1+1) PAES algorithm and its structure [9, 10] can be summarized as follows:

1.0 Begin.2.0 Generate an initial random solution called parent,

evaluate it according to the two objectives: maximizing Go playing fitness score and minimizing number of hidden units, and put it into the archive. The elements of the weight matrix are assigned random values according to a uniform distribution

U(-1, 1). The elements of the binary vector p to indicate the hidden neuron exist in the network are assigned the value 1 with probability 0.5 based on a random generated number according to a uniform distribution U(0, 1); 0 value otherwise.

3.0 Loop3.1 Mutate the parent with some uniform (0, 1)

probability to produce an offspring and evaluate the offspring.

)1,0(Nchildih

childih ��

)1,0(Nchildho

childho ��

if 0�childh�

1�childh� ; 0�child

h� Otherwise.3.2 Compare the offspring against the parent;3.2.1 If the offspring is dominated by the parent,

then discard the offspring and go back to step 3.1;

3.2.2 If the parent is dominated by the offspring, the offspring replaces the parent and becomes the parent in the next generation, then, compare offspring with the members in the archive;

3.2.3 If both of them are non-dominated, compare the offspring with the members in the archive;

3.3 Compare the offspring with the members in the archive;

3.3.1 If the offspring is dominated by any member in the archive, discard offspring and go back to step 3.1;

3.3.2 If the offspring dominates some members in the archive, remove all the dominated members in the archive, add the offspring into the archive and it becomes parent innext generation;

3.3.3 If the offspring is non-dominated with all the members in the archive, add the offspring into the archive and compare the fitness score with the parent. If the offspring’s fitness score is higher than parent, it becomes parent in next generation. Else, go back to step 3.1;

4.0 Repeat until maximum number of generations is reached or beat the GNU Go.

5.0 End

The (1+1) PAES algorithm used has been modified from the standard (1+1) PAES algorithm. For standard PAES algorithm, there consist of a procedure to test the crowding between parent, offspring and the archive. This step is to keep

track the degree of crowding in the different regions of the solution space. In this study, this procedure has been voided because the second objective (number of hidden units) is discrete number. Figure 1 shows the flowchart of the PAES algorithm.

Figure 1. The Pareto Archived Evolution Strategies used.

The following two fitness functions are used:

(1)

(2)

The first fitness function f1 is used by Stanley [12]. In f1, ei

is the estimated score on move i, n is the number of moves before the final move, and ef is the final score. This fitness equation weights the mean of the estimated score twice as much as the final score, which emphasizes the performance over the course of the whole game over the final position. Since GNU Go returns positive scores for white, they must be subtracted from 100 to correlate higher fitness with greater success for black. The means of the estimated score as well as the final score will be negative for Black since the positive score is for White. Besides that, this experimental setting fixed that at the end of each game, at least one quarter of the game board must be filled by the stones of both players and these will cause larger values of the mean of the estimated score to be calculated. Therefore if Black wins the game, the fitness scores will exceed 110. For the second fitness function, hi is the number of hidden units in the ANN.

IV. EXPERIMENTAL SETUP

In this study, GNU Go version 3.6’s game engine is used as the opponent of the evolving networks, shows in Figure 2. A simple three layered feed-forward ANN is used and the games are conducted on 7x7 board sizes with komi 6.5 for White. All neurons employ the sigmoid activation function. GNU Go is playing White while the network is playing Black. The games are played in ACSII mode in order to reduce the computational cost. The Japanese scoring system is used in this study. 0.1 to 0.9 mutation rates are used in PAES and repeated for 10 times for each setting. Level 10 is used in GNU Go 3.6 and the seed values will be change in every new game.

The chromosome used in this study is a class that consists of a real numbers matrix which represents the weights used in the ANNs. The structure of the ANN is composed of 49 inputs which are the current board positions on 7x7 boards. The board situation is encoded by a value for each intersection, empty = 0, white = 1 and black = 2, which is fed into the input layer. The structure of the ANN is composed of 50 output neurons which will decide the next move for Black, representing every possible playing position on a 7x7 boards plus a pass move. The highest activation of ANN will indicate the position on the board that the Black will choose to move on. If the highest activation indicates an invalid position on the board, the second highest activation will be used instead and so on. Initially, the numbers of hidden units is set to 100. The binary number for the hidden units represents a switch to turn a hidden neuron on

or off. Hidden units will be evolved using a bit-wise mutator and minimized during evolution.

Figure 2. GNU Go version 3.6 game engine is used and all games are play in ACSII mode. GNU Go is playing White while the evolved network is playing

Black. Komi 6.5 for White.

V. RESULTS AND DISCUSSIONSTABLE I. RESULTS OBTAINED USING MULTI-OBJECTIVE

OPTIMIZATION IN 7X7 BOARD GO GAMES

Mutation Rate

Best Fitness Score

Hidden Units

Average Fitness Score

0.1 62 51 48.0

0.2 70 37 51.8

0.3 60 44 47.2

0.4 64 56 49.5

0.5 60 45 42.3

0.6 66 50 41.9

0.7 58 43 38.0

0.8 172 53 42.1

0.9 164 60 50.7

The results show that the percentage of evolved networks to beat the GNU Go in 7x7 board is very low. In this study, 0.1 to 0.9 mutation rates are used in PAES and the percentage of wining GNU Go for 0.1 until 0.7 mutation rates is 0. Unfortunately, using large mutation rates in PAES like 0.8 and 0.9 can beat the GNU Go but only once from 10 repeated runs. The fitness scores obtained for both of them are 172 and 164 which are the highest fitness score among all the mutation rates. The number of hidden units of the ANNs was also reduced from 100 to 53 and 60. Although the percentage of using ANNs which evolved with (1+1) PAES algorithm to beat

the GNU Go in 7x7 board games is very low, the results showed that this algorithm has tremendous amount of potential to generate different type of Go game strategies and try to beat GNU Go in 7x7 board. Figures below show the beginning game, middle game and the end game where the evolved playerwon GNU Go.

Figure 3. Beginning game. GNU Go always play in the middle of the board and the estimate score showed that White is ahead by 12 scores.

Figure 4. Middle game. White stones have been surrounded by Black stones. The estimate score showed Black leading the game.

Figure 5. End game. Evolved network beated GNU Go in 7x7 board.

TABLE II. TESTING RESULTS OF HIGHEST FITNESS SCORES IN 7X7BOARD SIZE GO GAMES

Mutation Rate

Best Fitness Score

Hidden Units

Successful Runs

Final Game Score

0.8 172 53 100% B+38.5

0.9 164 60 100% B+22.5

Two networks which obtained the highest fitness score have been tested; which are 0.8 mutation rate and 0.9 mutation rate used in PAES and consist of 53 hidden units and 60 hidden units respectively. Testing result in Table II showed that the percentage of both networks to beat GNU Go in 7x7 board games is 100%. 16 game points different between both networks at the end game of 7x7 board against GNU Go. This performance is surprisingly good which mean that using 0.8 mutation rate in (1+1) PAES which consists of 53 hidden units is able to generate good Go game strategies and frequently win the level 10 GNU Go in 7x7 board.

VI. FUTURE WORKS

As for the future works, the fitness function can be further researched and improved. Fitness functions play an important role in evolving ANNs, which is not something trivial to design since evaluation of the Go board positions is not an easy task. With a suitable fitness function, the co-evolution approach might also perform well in larger board size Go games. The incremental evolution concept may be considered to be used in the evolution process where the games start from the beginner level to harder levels. These evolved ANN players should also be able to play Go games against other competitive computer Go to learn different Go playing strategies and to get different sets of playing strategies, knowledge and experiences. Hence, there still remain a lot of future directions for researches that are worth looking into in order to fully explore the conceptual robustness of evolutionary algorithms in larger board size Go games.

ACKNOWLEDGMENT

This research is funded under the ScienceFund project SCF52-ICT-3/2008 granted by the Ministry of Science, Technology and Innovation, Malaysia.

REFERENCES

[1] A. Lubberts, R. Miikkulainen, “Co-Evolving a Go-Playing Neural Network,” In Coevolution: Turning Adaptive Algorithms upon Themselves, Birds-of-a-Feather Workshop, Genetic and Evolutionary Computation Conference (Gecco), San Francisco, MA, USA, July 2001.

[2] B. Bouzy and T. Cazenave, “Computer Go: an AI Oriented Survey,” Artificial Intelligence Journal, vol. 132, pp. 39-103, 2001.

[3] D. B. Fogel, T. J. Hays, S. L. Hahn and J. Quon, “A Self-Learning Evolutionary Chess Program,” in Proceedings of the IEEE, vol. 92(12), December 2004, pp. 1947-1954.

[4] E. C. D. van der Werf, H. J. van den Herik, and J. W. H. M. Uiterwijk, “Solving Go on Small Boards,” International Computer Games Association (ICGA) Journal, vol 26(2), pp. 92-107, 2003.

[5] E. C. D. van der Werf, H. J. van den Herik, and J. W. H. M. Uiterwijk, “Learning to score final positions in the games of Go,” in Proceedings of the 10th Advances in Computer Games Conference (ACG-10), 2003, pp. 143-158.

[6] GnuGo Documentation -http://www.gnu.org/software/gnugo/gnugo_toc.html, access on January 2011.

[7] G. F. Luger, Artificial Intelligence: Structures and Strategies for Complex Problem Solving, 4th ed., U.S.: Addison Wesley, 2002.

[8] H. A. Mayer and P. Maier, “Coevolution of Neural Go Players in a Cultural Environment,” in Proceedings of the Congress on Evolutionary Computation, Edinburgh, Scotland, 2005, pp. 1017-1024.

[9] J. D. Knowles and D. W. Corne, “The Pareto Archived Evolution Strategy: A New Baseline Algorithm for Pareto Multi-objective Optimization,” in Proceedings of the Congress on Evolutionary Computation (CEC), 1999, pp. 98-105.

[10] J. D. Knowles and D. W. Corne, “Approximating the nondominated front using the Pareto Archived Evolution Strategy”. Evolutionary Computation, vol. 8(2), pp. 149–172, June 2000.

[11] J. Schrum and R. Miikkulainen, “Constructing Complex NPC Behavior via Multi-Objective Neuroevolution,” in Proceedings of the Fourth Artificial Intelligence and Interactive Digital Entertainment Conference, Stanford, California, USA, October 22-24, 2008, pp. 108-113.

[12] K. O. Stanley and R. Miikkulainen, “Evolving a Roving Eye for Go,” in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), New York, NY: Springer-Verlag, 2004, pp. 1226-1238.

[13] L. Wu, and P. Baldi, “Learning to Play Go Using Recursive Neural Networks”, Neural Network 21(9): pp. 1392-1400, 2008.

[14] M. Enzenberger, “Evaluation in Go by a Neural Network Using Soft Segmentation,” in Proceedings of the 10th Advances in Computer Games Conference, 2003, pp. 97-108.

[15] M. Negnevitsky, Artificial Intellegence: A Guide to Intelligent Systems, U.K.: Addison Wesley, 2002.

[16] N. Richards, D. Moriarty, P. McQuesten, and R. Miikkulainen, “Evolving Neural Networks to Play Go”. Applied Intelligence, vol. 8, pp. 85-96, 1998.

[17] S. M. Lucas, “Computational Intelligence and Games: Challenges and Opportunities,” International Journal of Automation and Computing, vol. 5(1), pp. 45-57, January 2008.

[ieee 2011 international conference on it and multimedia (icim) - kuala lumpur, malaysia...

Documents