p. dunne adapted from slides created by © 2000-2005 franz kurfess games 1 single-person game ...

P. Dunne Adapted from slides created by © 2000-2005 Franz Kurfess Games 1

Single-Person GameSingle-Person Game

conventional search problem identify a sequence of moves that leads to a winning state examples: Solitaire, dragons and dungeons, Rubik’s cube little attention in AI

some games can be quite challenging some versions of solitaire a heuristic for Rubik’s cube was found by the Absolver

program


Two-Person GameTwo-Person Game games with two opposing players

often called MAX and MIN usually MAX moves first, then min

in game terminology, a move comprises one step, or play by each player Typically you are Max

MAX wants a strategy to find a winning state no matter what MIN does

Max must assume MIN does the same or at least tries to prevent MAX from winning


Perfect DecisionsPerfect Decisions Optimal strategy for MAX

traverse all relevant parts of the search tree this must include possible moves by MIN

identify a path that leads MAX to a winning state

So MAX must do some work to estimate all the possible moves (to a certain depth) from the current position and try to plan the best way forward such that he will win

often impractical time and space limitations


4 7 9

Nodes are discovered using DFS

Once leaf nodes are discovered they are scored

Here Max is building a tree of possibilities

Which way should he play when the tree is finished?

Max’s possible moves

Mins’s possible moves

Max’s moves

Mins’s moves


Max-Min ExampleMax-Min Example

4 7 9 6 9 8 8 5 6 7 5 2 3 2 5 4 9 3

terminal nodes: values calculated from the utility function


MiniMax ExampleMiniMax Example

Min

4 7 9 6 9 8 8 5 6 7 5 2 3 2 5 4 9 3

4 7 6 2 6 3 4 5 1 2 5 4 1 2 6 3 4 3

other nodes: values calculated via minimax algorithm

Here the green nodes pick the minimum value from the nodes underneath



Max

Min

4 7 9 6 9 8 8 5 6 7 5 2 3 2 5 4 9 3

4 7 6 2 6 3 4 5 1 2 5 4 1 2 6 3 4 3

7 6 5 5 6 4


Here the red nodes pick the maximum value from the nodes underneath



Max

Min

Min

4 7 9 6 9 8 8 5 6 7 5 2 3 2 5 4 9 3

4 7 6 2 6 3 4 5 1 2 5 4 1 2 6 3 4 3

7 6 5 5 6 4

5 3 4



MiniMax ExampleMiniMax ExampleMax

Max

Min

Min

4 7 9 6 9 8 8 5 6 7 5 2 3 2 5 4 9 3

4 7 6 2 6 3 4 5 1 2 5 4 1 2 6 3 4 3

7 6 5 5 6 4

5 3 4

5


MiniMax ExampleMiniMax ExampleMax

Max

Min

Min

4 7 9 6 9 8 8 5 6 7 5 2 3 2 5 4 9 3

4 7 6 2 6 3 4 5 1 2 5 4 1 2 6 3 4 3

7 6 5 5 6 4

5 3 4

5

moves by Max and countermoves by Min


MiniMax ObservationsMiniMax Observations

the values of some of the leaf nodes are irrelevant for decisions at the next level

this also holds for decisions at higher levelsas a consequence, under certain circumstances,

some parts of the tree can be disregarded it is possible to still make an optimal decision without

considering those parts


What is pruning?What is pruning?

You don’t have to look at every node in the treediscards parts of the search tree

That are guaranteed not to contain good moves

results in substantial time and space savings as a consequence, longer sequences of moves can be

explored the leftover part of the task may still be exponential,

however


Alpha-Beta PruningAlpha-Beta Pruning extension of the minimax approach

results in the same move as minimax, but with less overhead prunes uninteresting parts of the search tree

certain moves are not considered won’t result in a better evaluation value than a move further up in the

tree they would lead to a less desirable outcome

applies to moves by both players indicates the best choice for Max so far

never decreases indicates the best choice for Min so far

never increases


NoteNote For the following example remember:

Nodes are found with DFS

As a terminal or leaf node (or at a temporary terminal node at a certain depth) is found a utility function gives it a score

So do we need to evaluate F and G once we find E? Can we prune the tree?

5

E F G


Alpha-Beta Example 1Alpha-Beta Example 1Max

Min[-∞, +∞]

5

Step 1 --- we expand the tree a little we assume a depth-first, left-to-right search as basic strategy

the range ( [-∞, +∞] ) of the possible values for each node are indicated initially the local values [-∞, +∞] reflect the values of the sub-trees in that node from Max’s or Min’s perspective;

Since we haven’t expanded they are infinite

the global values and are the best overall choices so far for Max or Min

[-∞, +∞]

best choice for Max ? best choice for Min?

Global Values

Local Values



Min

7

[-∞, 7]

5

We evaluate a node Min obtains the first value from a successor node

[-∞, +∞]

best choice for Max ? best choice for Min7

@ min node if value of node < value of parent then abandon.

…NO because no yet



Min

7 6

[-∞, 6]

5

Min obtains the second value from a successor node

[-∞, +∞]

best choice for Max ? best choice for Min6


…NO



Min

7 6 5

5

5[5, +∞]

best choice for Max 5 best choice for Min5

Min obtains the third value from a successor node this is the last value from this sub-tree, and the exact value is known Min is finished on this branch

Max now has a value for its first successor node, but hopes that something better might still come


…No more nodes



Min

7 6 5

5

5

Min continues with the next sub-tree, and gets a better value

Max has a better choice from its perspective (the ‘5’) and should not consider a move into the sub-tree currently being explored by Min

3

[5, +∞]


[-∞, 3]


…YES!



Min

7 6 5

5

5

Max won’t consider a move to this sub-tree it would choose 5 first =>abandon expansion this is a case of pruning, indicated by Pruning means that we won’t do a DFS

into these nodes No need to use the utility function to calculate the nodes value No need to explore that part of the tree further.

3

[5, +∞]


[-∞, 3]


…YES!



Min

7 6 5 6

5

Min explores the next sub-tree, and finds a value that is worse than the other nodes at this level

Min knows this ‘6’ is good for Max and will then evaluate the leaf nodes looking for a lower value (if it exists)

if Min is not able to find something lower, then Max will choose this branch

3


5

[5, +∞]

[-∞, 3] [-∞, 6]


…NO



Min

7 6 5 6

5

Min is lucky, and finds a value that is the same as the current worst value at this level

Max can choose this branch, or the other branch with the same value

3


5

[5, +∞]

[-∞, 3] [-∞, 5]

5


…YES!



Min

7 6 5 6

5

Min could continue searching this sub-tree to see if there is a value that is less than the current worst alternative in order to give Max as few choices as possible

this depends on the specific implementation Max knows the best value for its sub-tree

3

best choice for Max 5best choice for Min3

5

5

[-∞, 3] [-∞, 5]

5


Properties of Alpha-Beta PruningProperties of Alpha-Beta Pruning

in the ideal case, the best successor node is examined first alpha-beta can look ahead twice as far as minimax

assumes an idealized tree model uniform branching factor, path length random distribution of leaf evaluation values

requires additional information for good players game-specific background knowledge empirical data


Imperfect DecisionsImperfect Decisions

Does this mean I have to expand the tree all the way??? complete search is impractical for most games

alternative: search the tree only to a certain depth requires a cutoff-test to determine where to stop


Evaluation FunctionEvaluation Function

If I stop part of the way down how do I score these nodes (which are not terminal nodes!)

Use an evaluation function to score these nodes! must be consistent with the utility function

values for terminal nodes (or at least their order) must be the same

tradeoff between accuracy and time cost without time limits, minimax could be used


Example: Tic-Tac-ToeExample: Tic-Tac-Toe simple evaluation function

E(s) = (rx + cx + dx) - (ro + co + do) where r,c,d are the number of rows, columns and diagonals lines available

for a win for X (or O) in that state;

x and o are the pieces of the two players

1-ply lookahead start at the top of the tree evaluate all 9 choices for player 1 pick the maximum E-value

2-ply lookahead also looks at the opponents possible move

assuming that the opponents picks the minimum E-value


E(s1:2) 8

- 6= 2

E(s1:3) 8

- 5= 3

E(s1:4) 8

- 6= 2

E(s1:5) 8

- 4= 4

E(s1:6) 8

- 6= 2

E(s1:7) 8

- 5= 3

E(s1:8) 8

- 6= 2

E(s1:9) 8

- 5= 3

Tic-Tac-Toe 1-PlyTic-Tac-Toe 1-Ply

X X XX X X

X X X

E(s1:1) 8

- 5= 3

E(s0) = Max{E(s11), E(s1n)} = Max{2,3,4} = 4

E(s1:1) == E of state @ depth-level ‘1’ number ‘1’

Simple evaluation functionE(s1:1) = (rx + cx + dx) - (ro + co + do)

E(s1:1) = ‘X’ has (3rows + 3 cols + 2 diags) – ‘O’ has (2r + 2c +1d)

E(s11) = 8 -5 = 3


E(s2:16) 5

- 6= -1

E(s2:15) 5-6

= -1

E(s28) 5

- 5= 0

E(s27) 6

- 5= 1

E(s2:48) 5

- 4= 1

E(s2:47) 6

- 4= 2

E(s2:13) 5

- 6= -1

E(s2:9) 5

- 6= -1

E(s2:10) 5-6

= -1

E(s2:11) 5

- 6= -1

E(s2:12) 5

- 6= -1

E(s2:14) 5

- 6= -1

E(s25) 6

- 5= 1

E(s21) 6

- 5= 1

E(s22) 5

- 5= 0

E(s23) 6

- 5= 1

E(s24)4

- 5= -1

E(s26) 5

- 5= 0

E(s1:6) 8

- 6= 2

E(s1:7) 8

- 5= 3

E(s1:8) 8

- 6= 2

E(s1:9) 8

- 5= 3

E(s1:5) 8

- 4= 4

E(s1:3) 8

- 5= 3

E(s1:2) 8

- 6= 2

E(s1:1) 8

- 5= 3

E(s2:45) 6

- 4= 2


X X XX X X

X X X

E(s0) = Max{E(s11), E(s1n)} = Max{2,3,4} = 4

E(s1:4) 8

- 6= 2

X O XO

XO

E(s2:41) 5

- 4= 1

E(s2:42) 6

- 4= 2

E(s2:43) 5

- 4= 1

E(s2:44) 6

- 4= 2

E(s2:46) 5

- 4= 1

O XO

XOX

OX X

O

XO

X

O

X

O

XXO

X OO X X

O

XO

X

O

X

O

XO

XO

X OX O X

O

O

Note: For 2-Ply we must expand all nodes in the 1st level but as scores of 2 and 3 repeat a few times we only expand one of each.


E(s2:16) 5

- 6= -1

E(s2:15) 5-6

= -1

E(s28) 5

- 5= 0

E(s27) 6

- 5= 1

E(s2:48) 5

- 4= 1

E(s2:47) 6

- 4= 2

E(s2:13) 5

- 6= -1

E(s2:9) 5

- 6= -1

E(s2:10) 5-6

= -1

E(s2:11) 5

- 6= -1

E(s2:12) 5

- 6= -1

E(s2:14) 5

- 6= -1

E(s25) 6

- 5= 1

E(s21) 6

- 5= 1

E(s22) 5

- 5= 0

E(s23) 6

- 5= 1

E(s24)4

- 5= -1

E(s26) 5

- 5= 0

E(s1:6) 8

- 6= 2

E(s1:7) 8

- 5= 3

E(s1:8) 8

- 6= 2

E(s1:9) 8

- 5= 3

E(s1:5) 8

- 4= 4

E(s1:3) 8

- 5= 3

E(s1:2) 8

- 6= 2

E(s1:1) 8

- 5= 3

E(s2:45) 6

- 4= 2


X X XX X X

X X X

E(s0) = Max{E(s11), E(s1n)} = Max{2,3,4} = 4

E(s1:4) 8

- 6= 2

X O XO

XO

E(s2:41) 5

- 4= 1

E(s2:42) 6

- 4= 2

E(s2:43) 5

- 4= 1

E(s2:44) 6

- 4= 2

E(s2:46) 5

- 4= 1

O XO

XOX

OX X

O

XO

X

O

X

O

XXO

X OO X X

O

XO

X

O

X

O

XO

XO

X OX O X

O

O

It seems that the centre X (i.e. a move to E(s1:5)) is the best because its successor nodes have a value of 1 (as opposed to 0 or worse -1)


31

Checkers Case StudyCheckers Case Study initial board configuration

Black single on 20single

on 21king on

31 Red single on 23

king on 22

evaluation functionE(s) = (5 x1 + x2) - (5r1 + r2)

where

x1 = black king advantage,

x2 = black single advantage,

r1 = red king advantage,

r2 = red single advantage

1 2 3 4

865

9 10 11 12

161413

17 18 19 20

242221

25 26 27 28

323029

7

15

23


6

20 -> 16 21 ->

17

31 -> 26

31 -> 27

22 -> 17

21 -> 14

31

1 2 3 4

865

9 10 11 12

161413

17 18 19 20

242221

25 26 27 28

323029

7

15

23

MIN moves

Part 1 MiniMax using DFS

Typically you expand DFS style to a certain depth and then

eval. node E(s) = (5 x1 + x2) - (5r1 + r2) = (5+2) –(0 +1) = 6

takes red king

MAX moves

MAX moves


20 -> 16 21 ->

17

31 -> 26

31 -> 27

22 -> 17

22 -> 18

22 ->

25

22 -

> 2

6 23 -> 26

23 -> 27

21 -> 14

16 -> 11

31 -

> 27

16 ->

11 31 -> 27 31

->

27

16 -

> 1

1

31 -> 2731 -> 24

22 -> 13

22 -> 31

23 -> 30

23 -> 32

20 -

> 1

6 31 -> 27

31 -> 26

21 -> 1720 -

> 1

6 21 -> 17

20 -> 16

20 -> 16

21 -> 17

31

1 2 3 4

865

9 10 11 12

161413

17 18 19 20

242221

25 26 27 28

323029

7

15

23

MAX

MAX

MIN

We fast-forward and see the full tree to a certain depth


1 1 1 1 26 1 1 1 1 1 1 1 1 1 1 6 0 0 0 -4 -4 -8 -8 -8 -8

20 -> 16 21 ->

17

31 -> 26

31 -> 27

22 -> 17

22 -> 18

22 ->

25

22 -

> 2

6 23 -> 26

23 -> 27

21 -> 14

16 -> 11

31 -

> 27

16 ->

11 31 -> 27 31

->

27

16 -

> 1

1

31 -> 2731 -> 24

22 -> 13

22 -> 31

23 -> 30

23 -> 32

20 -

> 1

6 31 -> 27

31 -> 26

21 -> 1720 -

> 1

6 21 -> 17

20 -> 16

20 -> 16

21 -> 17

31

1 2 3 4

865

9 10 11 12

161413

17 18 19 20

242221

25 26 27 28

323029

7

15

23

MAX

MAX

MIN

Then we score leaf nodes


1

1

1 1 1 2

2

6

6

1

1

1 1 1 1 1

1

1 1 1 1 6

6

0

0

0 0 -4

-4

-4 -8

-8

-8 -8

-8

-8

1 0 -8 -8

1

20 -> 16 21 ->

17

31 -> 26

31 -> 27

22 -> 17

22 -> 18

22 ->

25

22 -

> 2

6 23 -> 26

23 -> 27

21 -> 14

16 -> 11

31 -

> 27

16 ->

11 31 -> 27 31

->

27

16 -

> 1

1

31 -> 2731 -> 24

22 -> 13

22 -> 31

23 -> 30

23 -> 32

20 -

> 1

6 31 -> 27

31 -> 26

21 -> 1720 -

> 1

6 21 -> 17

20 -> 16

20 -> 16

21 -> 17

31

1 2 3 4

865

9 10 11 12

161413

17 18 19 20

242221

25 26 27 28

323029

7

15

23

MAX

MAX

MIN

Then we propagate Min-Max scores upwards


Could we make this a little easier?Could we make this a little easier?

This tree will really expand some considerable distance

What about if we introduce a little pruning

OK then … but first we have to go back to the start.


6

20 -> 16 21 ->

17

31 -> 26

31 -> 27

22 -> 17

21 -> 14

MIN moves

Checkers Min-Max With pruningCheckers Min-Max With pruning

MAX moves

MAX moves

6

6

Temporarily propagate values

up


Checkers Alpha-Beta ExampleCheckers Alpha-Beta Example

6

6

1

1

1 1 1 1

6

20 -> 16 21 ->

17

31 -> 26

31 -> 27

22 -> 17

22 -> 18

21 -> 14

16 -> 11

31 -

> 27

MAX

MAX

MIN

@ max node if value of node > value of parent then abandon.

…No! … No! … No! …. No …. No!

Alas no pruning below this Max

node!

Next node is expanded



6

6

1

1

1 1 1 1 1

1

1 1 1 1

1

20 -> 16 21 ->

17

31 -> 26

31 -> 27

22 -> 17

22 -> 18

22 ->

25

21 -> 14

16 -> 11

31 -

> 27

16 ->

11 31 -> 27

31

1 2 3 4

865

9 10 11 12

161413

17 18 19 20

242221

25 26 27 28

323029

7

15

23

MAX

MAX

MIN

Notice new value propagated up!

@ max node if value of node > value of parent then abandon.

…No! So its in the interest of Max node to see if it can do better

… No! etc

… No! etc

…. No etc

…. No!



6

6

1

1

1 1 1 1 1

1

1 1 1 1

1

1

20 -> 16 21 ->

17

31 -> 26

31 -> 27

22 -> 17

22 -> 18

22 ->

25

21 -> 14

16 -> 11

31 -

> 27

16 ->

11 31 -> 27

31

1 2 3 4

865

9 10 11 12

161413

17 18 19 20

242221

25 26 27 28

323029

7

15

23

Its not in MAXs interest to bother expanding these nodes as they have the same value as 16-> 11 so we prune them

MAX

MAX

MIN



6

6

1

1

1 1 1 1 1

1

1 1 1 1 6

6

1

20 -> 16 21 ->

17

31 -> 26

31 -> 27

22 -> 17

22 -> 18

22 ->

25

22 -

> 2

6

21 -> 14

16 -> 11

31 -

> 27

16 ->

11 31 -> 27 31

->

22

16 -

> 1

1

31

1 2 3 4

865

9 10 11 12

161413

17 18 19 20

242221

25 26 27 28

323029

7

15

23

MAX

MAX

MIN



1

1

1 1 16

6

1

1

1 1 1 1 1

1

1 1 1 1 6

6

1

1

20 -> 16 21 ->

17

31 -> 26

31 -> 27

22 -> 17

22 -> 18

22 ->

25

22 -

> 2

6 23 -> 26

21 -> 14

16 -> 11

31 -

> 27

16 ->

11 31 -> 27 31

->

22

16 -

> 1

1

31 -> 27

MAX

MAX

MIN


Search LimitsSearch Limits

search must be cut off because of time or space limitations

strategies like depth-limited or iterative deepening search can be used don’t take advantage of knowledge about the problem

more refined strategies apply background knowledge quiescent search

cut off only parts of the search space that don’t exhibit big changes in the evaluation function


Games and ComputersGames and Computers

state of the art for some game programs Chess Checkers Othello Backgammon Go


ChessChess

Deep Blue, a special-purpose parallel computer, defeated the world champion Gary Kasparov in 1997 the human player didn’t show his best game

some claims that the circumstances were questionable

Deep Blue used a massive data base with games from the literature

Fritz, a program running on an ordinary PC, challenged the world champion Vladimir Kramnik to an eight-game draw in 2002 top programs and top human players are roughly equal


CheckersCheckers

Arthur Samuel develops a checkers program in the 1950s that learns its own evaluation function reaches an expert level stage in the 1960s

Chinook becomes world champion in 1994 human opponent, Dr. Marion Tinsley, withdraws for health

reasons Tinsley had been the world champion for 40 years

Chinook uses off-the-shelf hardware, alpha-beta search, end-games data base for six-piece positions


OthelloOthello

Logistello defeated the human world champion in 1997

many programs play far better than humans smaller search space than chess little evaluation expertise available


BackgammonBackgammon

TD-Gammon, neural-network based program, ranks among the best players in the world improves its own evaluation function through learning

techniques search-based methods are practically hopeless

chance elements, branching factor

p. dunne adapted from slides created by © 2000-2005 franz kurfess games 1 single-person game ...

Documents