[lecture notes in computer science] computational collective intelligence. technologies and...

N.-T. Nguyen et al. (Eds.): ICCCI 2012, Part II, LNAI 7654, pp. 182–191, 2012. © Springer-Verlag Berlin Heidelberg 2012

Opponent Modeling in Texas Hold’em Poker

Grzegorz Fedczyszyn, Leszek Koszalka, and Iwona Pozniak-Koszalka

Department of Systems and Computer Networks, Wroclaw University of Technology, Wroclaw, Poland

[email protected],[email protected],

[email protected]

Abstract. In this paper a new algorithm for prediction opponent move in Texas Hold’em Poker game is presented. The algorithm is based on artificial intelli-gence approach – it uses several neural networks, each trained on a specific da-taset. The results given by algorithm may be applied to improve players’ game. Moreover, the algorithm may be used as a part of more complex algorithm created for supporting decision making in Texas Hold’em Poker.

Keywords: Poker game, algorithm, artificial intelligence, neural network.

1 Introduction

Texas Hold’em Poker is currently the world’s most played card game. Hundreds of thousands of people play this game every day and can play in online Poker rooms as well as in real life. One of the main reasons for Poker’s recent success is its funda-mental dynamics. The ‘hidden’ elements of the game mean players must observe their opponent’s characteristics to be able to make a good move, i.e., to choose a good decision.

In order to play Poker well, a Poker player needs to constantly think about next move to be made by his opponents. For example, when the Poker player hand is very likely to win in certain spot, he should figure out how to win the most money from his opponents. Good player also has to recognize spots when the opponents may fold and try to somehow estimate how often will this happen in certain spot and when bluffing may be a profitable option. To do this, the player must take several factors into account, including such as: what kind of opponents is he facing, what kind of board is on the table, what was his and the opponents’ previous action, does he have position over the opponent, and many more.

In literature, it can be found some ideas of opponent modeling in Texas Hold’em Poker. Van der Klein [1] described algorithm based on decision tree but Mccurley [2] proposed using artificial intelligence agent approach.

In this paper, we propose a machine learning algorithm that predicts opponent’s ac-tion when certain information about the opponent and the current state of the game is given. The algorithm is based on several neural networks trained in predicting Poker player moves.

The rest of the paper is organized as follows. In Section 2, Texas Hold’em Poker rules have been explained. In Section 3, two algorithms for prediction Poker moves have been described, including the proposed one. Section 4 focuses on the designed

Opponent Modeling in Texas Hold’em Poker 183

and implemented experimentation system. Section 5 is devoted investigations – it contains brief analysis of the results of experiments and their discussion. Conclusion and perspectives appear in Section 6.

2 Texas Hold’em Poker

2.1 Game Description

Texas Hold’em Poker is played with a standard deck of 52 cards. Typically the num-ber of player in cash games varies from 2 to 9. Each player is dealt 2 card faces down. These cards are called hole cards or pocket cards. There are used to compose the five cards hand – each of this hands belongs to one of categories listed in Table 1. This category determines the strength of player’s hand.

Table 1. Poker hands ranks

Hand Example Description

Royal flush

A straight flush T to A

Straight flush

A straight of a single suit

Four of a kind

Four cards of the same rank

Full house

Three of a kind + one pair

Flush

Five cards of the same suit

Straight

Five cards of sequential rank

Three of a kind

Three cards of the same rank

Two pairs

Two pairs

One pair

Two cards of the same rank

High card

None of the above

Game starts when two players next to the dealer pays small blind and big blind.

Dealer is one of the players and that designation moves clockwise around the table after each hand is finished. Small blind and big blind are small amounts of money that has to be paid by two players to the left of the dealer before they can see their hole cards. Those forced bets are costs of the game – their forces all players to play more hands - without the blinds players could just sit at the table and wait for the best starting hand and fold any other hand.

When the blinds are on the table players may look at their hole cards and the pref-lop betting round begins. Each betting round is also called street or barrel. In pre-flop betting round players takes action clockwise starting from a player left to big blind player. There are five possible player actions. When player is not facing a bet he can bet or check and when he is facing a bet he can raise, call or fold. Of course player can also fold when he is not facing a bet but this move does not make any logical sense and it will not be counted as a possibility for player in this work. Each move represents one of the following actions:

184 G. Fedczyszyn, L. Koszalka, and I. Pozniak-Koszalka

• Fold: the player does not put in any more money, discards his cards and surrend-ers his chance at winning the pot. If he was one of two players left in the game, the remaining player wins the pot and does not have to show his pocket cards.

• Call: the player matches the current maximum amount wagered and stays in the game and if he is last to act in a betting round game continuous with next betting round or showdown.

• Raise: the player matches the current maximum amount wagered and additionally puts in extra money that the other players now have to call in order to stay in the game.

• Bet: this is similar to a raise. When a player was not facing a bet by an opponent (and the amount to call consequently was 0) and puts in the first money, he bets.

• Check: this is similar to a call, except the amount to call must be 0: if no one has bet (or the player has already put in the current highest bet, which could be the case if he paid the big blind) and the player to act does not want to put in any money, he can check.

Betting round is over when either all players but one folded or all players that did not fold matched the current highest bet by calling it or going all-in (putting all their money into the pot). After preflop is over three cards are being dealt face up on the table. These three cards are called flop cards and the next betting round called flop occurs. First player left to the button is first to act. Actions that can be performed by players are the same as in preflop round. After this betting round is over one more card is being dealt face up and it is added to community cards. This card is called turn card and next betting round is begun. After it is over and there are still two or more players in a game that did not fold fourth and last card is dealt on the table. This card is called river - it starts the last betting round. When the river betting round has com-pleted and two or more players are still in the game, the showdown follows.

On showdown each remaining players reveal their pocket cards and the player with a strongest five cards poker hand wins the pot, if two or more players have the same hands they split the pot. Poker hand must be composed from five cards. Player can use both of his hole cards, one of them or non – in this case his hand is composed only from community cards.

2.2 States of Poker Game

As described above Poker game consists of 4 betting rounds (also named streets or barrels): preflop, flop, turn and river. In each of those rounds player can be facing a bet and may have an option to raise, call or fold or not facing a bet and be able to check, bet or fold. Of course, folding while not facing a bet is possible but does not make any logical sense, so this move will not be considered as it almost never appears in a real Poker game.

That kind of representation gives up 8 states of a Poker game:

(i) preflop facing a bet, (ii) preflop not facing a bet, (iii) flop facing a bet, (iv) flop not facing a bet,


(v) turn facing a bet, (vi) turn not facing a bet, (vii) river facing a bet, (viii) river not facing a bet.

3 Algorithms

3.1 Opponent Modeling Using Decision Trees

The algorithm, presented in [1], applied a machine learning to predict opponent moves. Authors decided to use decision trees as they give results very fast when trained and have many advantages needed in this particular case. In the implementa-tion the input data set is composed by over 40 different features taken into account for different states and aspects of game. Authors of this algorithm have also proposed a method called group specific opponent modeling. In this method the different trees are created not only for different states of the game but also for different kind of op-ponents whose moves they are trying to predict. In the algorithm they created 9 dif-ferent opponent models using K-model clustering. Finally, the number of 72 different decision trees - each for different opponent model and different state of the game – are considered. More detailed description of the algorithm can be found in [1].

3.2 Opponent Modeling Using Neural Networks

In this sub-section the newly created algorithm for prediction opponent moves is pre-sented. The algorithm is a modification of method presented in 3.1. The most impor-tant difference is that instead of decision trees - neural networks has been used.

This modification may significantly improve quality of classification of moves as the neural networks are known to perform well in a noisy domain [4].

Inputs and Outputs. For each state of Poker game, one multi-layer neural network is created. Each network differs in number of inputs as on each street more information can be used to predict opponent’s move. For pre-flop it is 20 inputs, on a flop it is 31, on a turn 37 and 43 on a river. Chosen inputs are describing state of the game using information about the current board, opponents that the player is facing, his position on a table, stack sized etc. Chosen inputs are representing aspects of a game that the good poker players are usually taking into account when trying to predict opponent move. Each network has 2 or 3 outputs: 2 for not facing a bet (check or bet) and 3 for facing a bet (fold, call, and raise).

K-model Clustering. In Poker game, the player uses different strategies when play-ing. For one player some factors may be more important than for other. Some players tend to call more when others wait for a very strong hand and fold all other hands. Good poker players usually label each of their opponents to remember how to play against them. Because of such clusterization of players, one should improve quality of opponent modeling. The idea of K-model clustering, which is used in this work, does not require any expert knowledge or distance measure like in k- mean clustering.

In K-model approach, clustering one set of neural networks is created for each cluster of players. At first, each player is randomly assigned to one cluster. Then, a set


of neural networks corresponding to each cluster is being trained on hands of players in cluster (Fig. 1). Next step is to forget of the current cluster assignment for players and to find a new one. Repeating, we have models trained on random players (Fig. 2).

Fig. 1. Training models

Fig. 2. After training

Fig. 3. Choosing a new cluster

Fig. 4. Moving player to a new cluster

Then, we are trying to find a new cluster for each of the players by calculating the accuracy of the reached model for each of the players hands (Fig. 3). We assign each player to model that gives the highest correct classification rate for this player’s hands (Fig. 4).

After we do it for all of the players we should get results like in Fig. 5 where each cluster contains mostly players of the same type.

Next, we repeat this procedure and we stop it after none of the players changes his cluster assignment (Fig. 6).


Fig. 5. Example results after first iteration

Fig. 6. Example results after convergence

4 Experimentation System

To perform tests and check how the proposed algorithm works an experimentation system was created. The idea of the system is presented in Fig. 7. There are distinct two inputs: (i) problem parameters - data set, and (ii) algorithm parameters - neural networks parameters. As outputs we consider trained neural networks, and the ob-tained results of classification process.

Fig. 7. Experimentation system

Input Dataset. Used dataset was created by observing an online Poker games and saving hand histories in a text files. For experiments, 100.000 hands played on No Limit $2, No Limit $5 and No Limit $10 with 5 and 6 players on a table were used. That dataset gave 753.595 moves (network input objects) performed by players. Such a big number of input objects were used due to noise and variance of the domain which in this case is very high. Dataset statistics can be found in Table 2. As shown in this table, the most common move on flop is fold (70% of moves), on flop 39% of moves are checked, on turn 43% of moves are checked, and on river it is 44%. That means that dummy classificatory that would always choose most common move on each street as an answer would get 61% of overall correct classification accuracy.

Neural Network Parameters. All networks were trained using back propagation learning algorithm with learning rate set to 0.3. Each network was trained over the same dataset 500 times as this number of iteration was found to be sufficient to mi-nimize the error as much as it was possible. K-model clustering was set to divide players into 8 clusters (K=8). Each network is a 2 layer network where number of neurons in a middle layer is a half of a number of inputs. Activation function in all


neurons was bipolar sigmoid function with alpha parameter set to 1. Momentum of each network was set to 0.1.

Table 2. Dataset statistics

Preflop Flop Turn River Total

Fold 3383038 (71.71%)

149375 (17.62%)

54951 (13.32%)

31097 (13.68%)

3618461 (58.31%)

Check 80197 (1.70%)

338292 (39.91%)

183084 (44.39%)

107659 (47.36%)

709232 (11.43%)

Call 556992 (11.81%)

100507 (11.86%)

49470 (11.99%)

19437 (8.55%)

726406 (11.71%)

Bet 18833

(0.40%) 226735

(26.75%) 111137

(26.94%) 62217

(27.37%) 418922 ( 6.75%)

Raise 678843 (14.39%)

32747 ( 3.86%)

13830 (3.35%)

6920 ( 3.04%)

732340 (11.80%)

Total 4717903 (76.03%)

847656 (13.66%)

412472 (6.65%)

227330 ( 3.66%)

6205361 (100.00%)

System Outputs. As the training process takes a lot of time, after training, each net-work is saved as an XML file and can be used later for performing more tests. More-over, the results of experiments are automatically gathered and saved – the confusion matrices and the correct classification rates for each network are created.

5 Investigation

All eight networks where trained on given dataset 500 times using 90% of input data-set. Remaining 10% was used for testing. Results of experiments are shown in the confusion matrices (Table 3 and Table 4), where AC means the assigned class and TC means the true class. The confusion matrices are corresponded to the states of a game.

Table 3. Facing a bet results

TC - Fold TC - Call TC - Raise Total

AC - Fold 75.52 % 12.42 % 11.94 % 85.47 %

AC - Call 29.16 % 59.11 % 11.47 % 11.88 %

AC - Raise 31.00 % 9.86 % 54.98 % 2.65 %

Total 69.11 % 17.87 % 13.02 % 73.16 % Each row in tables contains percentage value of correct or incorrect (confused)

classification for each class. The last column denoted as ‘Total’ contains percentage values of how many objects there were in a testing set while ‘Total’ row informs of how many objects were classified as objects of a given class. The cell, where ‘Total’ row and ‘Total’ column are crossed, contains percentage of correct classifications.


Table 4. Not facing a bet results

TC - Check TC - Bet Total

AC - Check 75.27 % 24.73 % 71.12 %

AC - Bet 33.41 % 66.59 % 28.88 %

Total 63.18 % 36.82 % 72.77 %

In order to measure the performance of clustering we used three quality measures:

VPIP - Voluntary Put money Into Pot which tells us how often player plays a game preflop (does not fold preflop).

PFR - PreFlop Raise which informs how often player raises pre-flop. AF - Aggression Factor which informs how aggressive player is.

The obtained results are shown in Fig. 8 and in Fig. 9.

Fig. 8. Performance of VPIP, PFR, and AF indicators

The confusion matrices are corresponded to the two states of a game. Each row in tables contains percentage value of correct or incorrect (confused) classification for each class. The last column denoted as ‘Total’ contains percentage values of how many objects there were in a testing set while ‘Total’ row contains percentage values


of how many objects were classified as objects of a given class. The cell, where ‘To-tal’ row and ‘Total’ column are crossed, contains percentage of correct classifications.

Fig. 9. Performance – overall accuracy.

K-model clustering algorithm divided the player pool into 8 clusters. It may be ob-served on the presented graphs (Fig. 8) that after 41 iterations of the algorithm each cluster represented different kind of player. Average VPIP in each cluster varies from 20% to about 60%, average PFR from 12% to 22%, and AF from 1.8 to 3.5. Also on “Number of players” graph we can see that the number of players in each cluster changed during the clustering process. Unfortunately, the proposed algorithm was stopped before the stop condition was met – none of the players changes cluster – because it was working for a long time and the improvement in accuracy on any itera-tion was very small.

Overall correct classification accuracy achieved on whole testing set was 72.8%. (Fig. 9). Surprisingly, algorithm did better for facing a bet state than for not facing. Accuracy also increased during iterations and a gain was biggest for several first itera-tions but on later phase it was also continuously improving.

Comparing those results to the results from [1] we observed that there are about 4% better for 6 players’ game. Unfortunately the authors of [1] tested their algorithm over different datasets composed of hands that did not include real money but only “play” money. They obtained 71% accuracy of predicting opponent moves for 6 play-ers’ dataset. We were not able to access the dataset considered in [1], so the compari-son may not be accurate.

6 Conclusion

In this paper, the method for predicting opponent’s moves has been presented. This method uses 64 neural networks that are trained to predict different opponent’s ac-tions in various stages of a game. As a result, we obtained a universal tool for predict-ing opponent’s moves that do not depend on opponent’s playing style and strategy.

60%

65%

70%

75%

1 4 7 10 13 16 19 22 25 28 31 34 37 40

Iteration

Accuracy


The overall correct classification percentage obtained when using the created algo-rithm was about 18%. It is better than a simplest classifier that would choose most common move on each betting round. It is also about 4% better than the result pre-sented in [1] while method based on decision trees was used. Neural networks seem to operate better in very noisy environment which game of Poker surely is.

By looking at the obtained confusion matrices, it is easy to notice that a lot of moves were confused with folds and that the fold was the most correctly recognized move. That stems from the fact that the players usually fold to a bet. We can observe this phenomenon in Table 2, where only 3% of all moves occurred at river – the last betting round. Having that in mind, we might come up with an improvement to com-pose the dataset mostly of hands that gave more action and hands in which the pot was bigger than usual. That would also require us to somehow predict where the hand is going before actual prediction of a move.

K-model clustering algorithm seems to be an effective way to divide poker players into categories basing only on hand that they played in the past. It divided the player pool into clusters that differ in average statistics, thus, we may suppose that these players also use different poker strategy or game style. Taking it in to consideration should improve the quality of overall classification.

References

1. Van der Kleij, A.A.J.: Monte Carlo Tree Search and Opponent Modeling through Player Clustering in no-limit Texas Hold’em Poker. University of Groningen, The Netherlands (2010)

2. Mccurley, P.: An Artificial Intelligence Agent for Texas Hold’em Poker. Dissertation, Uni-versity of Newcastle upon Tyne, The U.K. (2009)

3. Forge, A.: NET documentation available at http://code.google.com/p/aforge/ 4. Xhemali, D., Hinde, C.J., Stone, R.G.: Naïve Bayes vs. Decision Trees vs. Neural Networks

in the Classification of Training Web Pages. University Loughborough, Leicestershire (2009)

5. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2009) 6. Davidson, A.: Opponent Modeling in Poker: Learning and Acting in a Hostile and Uncer-

tain Environment. Master’s thesis, University of Alberta, Edmonton, Canada (2002) 7. Davidson, A., Billings, D., Schaeffer, J., Szafron, D.: Improved Opponent Modeling in

Poker. In: Proceedings of International Conference on Artificial Intelligence, ICAI 2000, Las Vegas, Nevada, The U.S., pp. 1467–1473 (2000)

8. http://www.pokerstrategy.com/glossary/ - Poker glossary

[lecture notes in computer science] computational collective intelligence. technologies and...

Documents