Gameplaying
Chapter6
Chapter61
Outline
♦Games
♦Perfectplay–minimaxdecisions–α–βpruning
♦Resourcelimitsandapproximateevaluation
♦Gamesofchance
♦Gamesofimperfectinformation
Chapter62
Gamesvs.searchproblems
“Unpredictable”opponent⇒solutionisastrategyspecifyingamoveforeverypossibleopponentreply
Timelimits⇒unlikelytofindgoal,mustapproximate
Planofattack:
•Computerconsiderspossiblelinesofplay(Babbage,1846)
•Algorithmforperfectplay(Zermelo,1912;VonNeumann,1944)
•Finitehorizon,approximateevaluation(Zuse,1945;Wiener,1948;Shannon,1950)
•Firstchessprogram(Turing,1951)
•Machinelearningtoimproveevaluationaccuracy(Samuel,1952–57)
•Pruningtoallowdeepersearch(McCarthy,1956)
Chapter63
Typesofgames
deterministicchance
perfect information
imperfect information
chess, checkers,go, othello
backgammonmonopoly
bridge, poker, scrabblenuclear war
battleships,blind tictactoe
Chapter64
Gametree(2-player,deterministic,turns)
X XX X
XX
X
X X
MAX (X)
MIN (O)
XX
O
OO XO
OOO
OO O
MAX (X)
XO XO XOXXX
XX
XX
MIN (O)
XOXXOXXOX
. . .. . .. . .. . .
. . .
. . .
. . .
TERMINALX X
−1 0+1 Utility
Chapter65
Minimax
Perfectplayfordeterministic,perfect-informationgames
Idea:choosemovetopositionwithhighestminimaxvalue=bestachievablepayoffagainstbestplay
E.g.,2-plygame:
MAX
31286 4 21452
MIN
3
A1A3 A2
A13 A12 A11A21A23 A22A33 A32 A31
322
Chapter66
Minimaxalgorithm
functionMinimax-Decision(state)returnsanaction
inputs:state,currentstateingame
returntheainActions(state)maximizingMin-Value(Result(a,state))
functionMax-Value(state)returnsautilityvalue
ifTerminal-Test(state)thenreturnUtility(state)
v←−∞
fora,sinSuccessors(state)dov←Max(v,Min-Value(s))
returnv
functionMin-Value(state)returnsautilityvalue
ifTerminal-Test(state)thenreturnUtility(state)
v←∞
fora,sinSuccessors(state)dov←Min(v,Max-Value(s))
returnv
Chapter67
Propertiesofminimax
Complete??
Chapter68
Propertiesofminimax
Complete??Onlyiftreeisfinite(chesshasspecificrulesforthis).NBafinitestrategycanexisteveninaninfinitetree!
Optimal??
Chapter69
Propertiesofminimax
Complete??Yes,iftreeisfinite(chesshasspecificrulesforthis)
Optimal??Yes,againstanoptimalopponent.Otherwise??
Timecomplexity??
Chapter610
Propertiesofminimax
Complete??Yes,iftreeisfinite(chesshasspecificrulesforthis)
Optimal??Yes,againstanoptimalopponent.Otherwise??
Timecomplexity??O(bm
)
Spacecomplexity??
Chapter611
Propertiesofminimax
Complete??Yes,iftreeisfinite(chesshasspecificrulesforthis)
Optimal??Yes,againstanoptimalopponent.Otherwise??
Timecomplexity??O(bm
)
Spacecomplexity??O(bm)(depth-firstexploration)
Forchess,b≈35,m≈100for“reasonable”games⇒exactsolutioncompletelyinfeasible
Butdoweneedtoexploreeverypath?
Chapter612
α–βpruningexample
MAX
3128
MIN3
3
Chapter613
α–βpruningexample
MAX
3128
MIN3
2
2
XX
3
Chapter614
α–βpruningexample
MAX
3128
MIN3
2
2
XX14
14
3
Chapter615
α–βpruningexample
MAX
3128
MIN3
2
2
XX14
14
5
5
3
Chapter616
α–βpruningexample
MAX
3128
MIN
3
3
2
2
XX14
14
5
5
2
2
3
Chapter617
Whyisitcalledα–β?
..
..
..
MAX
MIN
MAX
MINV
αisthebestvalue(tomax)foundsofaroffthecurrentpath
IfVisworsethanα,maxwillavoidit⇒prunethatbranch
Defineβsimilarlyformin
Chapter618
Theα–βalgorithm
functionAlpha-Beta-Decision(state)returnsanaction
returntheainActions(state)maximizingMin-Value(Result(a,state))
functionMax-Value(state,α,β)returnsautilityvalue
inputs:state,currentstateingame
α,thevalueofthebestalternativeformaxalongthepathtostate
β,thevalueofthebestalternativeforminalongthepathtostate
ifTerminal-Test(state)thenreturnUtility(state)
v←−∞
fora,sinSuccessors(state)do
v←Max(v,Min-Value(s,α,β))
ifv≥βthenreturnv
α←Max(α,v)
returnv
functionMin-Value(state,α,β)returnsautilityvalue
sameasMax-Valuebutwithrolesofα,βreversed
Chapter619
Propertiesofα–β
Pruningdoesnotaffectfinalresult
Goodmoveorderingimproveseffectivenessofpruning
With“perfectordering,”timecomplexity=O(bm/2
)⇒doublessolvabledepth
Asimpleexampleofthevalueofreasoningaboutwhichcomputationsarerelevant(aformofmetareasoning)
Unfortunately,3550
isstillimpossible!
Chapter620
Resourcelimits
Standardapproach:
•UseCutoff-TestinsteadofTerminal-Test
e.g.,depthlimit(perhapsaddquiescencesearch)
•UseEvalinsteadofUtility
i.e.,evaluationfunctionthatestimatesdesirabilityofposition
Supposewehave100seconds,explore104
nodes/second⇒10
6nodespermove≈35
8/2
⇒α–βreachesdepth8⇒prettygoodchessprogram
Chapter621
Evaluationfunctions
Black to move
White slightly better
White to move
Black winning
Forchess,typicallylinearweightedsumoffeatures
Eval(s)=w1f1(s)+w2f2(s)+...+wnfn(s)
e.g.,w1=9withf1(s)=(numberofwhitequeens)–(numberofblackqueens),etc.
Chapter622
Digression:Exactvaluesdon’tmatter
MIN
MAX
2 1
1
4 2
2
20
1
1400 20
20
BehaviourispreservedunderanymonotonictransformationofEval
Onlytheordermatters:payoffindeterministicgamesactsasanordinalutilityfunction
Chapter623
Deterministicgamesinpractice
Checkers:Chinookended40-year-reignofhumanworldchampionMarionTinsleyin1994.Usedanendgamedatabasedefiningperfectplayforallpositionsinvolving8orfewerpiecesontheboard,atotalof443,748,401,247positions.
Chess:DeepBluedefeatedhumanworldchampionGaryKasparovinasix-gamematchin1997.DeepBluesearches200millionpositionspersecond,usesverysophisticatedevaluation,andundisclosedmethodsforextendingsomelinesofsearchupto40ply.
Othello:humanchampionsrefusetocompeteagainstcomputers,whoaretoogood.
Go:humanchampionsrefusetocompeteagainstcomputers,whoaretoobad.Ingo,b>300,somostprogramsusepatternknowledgebasestosuggestplausiblemoves.
Chapter624
Nondeterministicgames:backgammon
123456789101112
242322212019181716151413
0
25
Chapter625
Nondeterministicgamesingeneral
Innondeterministicgames,chanceintroducedbydice,card-shuffling
Simplifiedexamplewithcoin-flipping:
MIN
MAX
2
CHANCE
474605−2
240−2
0.50.50.50.5
3−1
Chapter626
Algorithmfornondeterministicgames
Expectiminimaxgivesperfectplay
JustlikeMinimax,exceptwemustalsohandlechancenodes:
...ifstateisaMaxnodethen
returnthehighestExpectiMinimax-ValueofSuccessors(state)ifstateisaMinnodethen
returnthelowestExpectiMinimax-ValueofSuccessors(state)ifstateisachancenodethen
returnaverageofExpectiMinimax-ValueofSuccessors(state)...
Chapter627
Nondeterministicgamesinpractice
Dicerollsincreaseb:21possiblerollswith2diceBackgammon≈20legalmoves(canbe6,000with1-1roll)
depth4=20×(21×20)3≈1.2×10
9
Asdepthincreases,probabilityofreachingagivennodeshrinks⇒valueoflookaheadisdiminished
α–βpruningismuchlesseffective
TDGammonusesdepth-2search+verygoodEval
≈world-championlevel
Chapter628
Digression:ExactvaluesDOmatter
DICE
MIN
MAX
22331144
2314
.9.1.9.1
2.11.3
2020303011400400
20301400
.9.1.9.1
2140.9
BehaviourispreservedonlybypositivelineartransformationofEval
HenceEvalshouldbeproportionaltotheexpectedpayoff
Chapter629
Gamesofimperfectinformation
E.g.,cardgames,whereopponent’sinitialcardsareunknown
Typicallywecancalculateaprobabilityforeachpossibledeal
Seemsjustlikehavingonebigdicerollatthebeginningofthegame∗
Idea:computetheminimaxvalueofeachactionineachdeal,thenchoosetheactionwithhighestexpectedvalueoveralldeals
∗
Specialcase:ifanactionisoptimalforalldeals,it’soptimal.∗
GIB,currentbestbridgeprogram,approximatesthisideaby1)generating100dealsconsistentwithbiddinginformation2)pickingtheactionthatwinsmosttricksonaverage
Chapter630
Example
Four-cardbridge/whist/heartshand,Maxtoplayfirst
8
92
6 668766766766767
4293429342343430
Chapter631
Example
Four-cardbridge/whist/heartshand,Maxtoplayfirst
6
4
8
92
6 668766766766767
4293429342343430
8
92
66876676676677
29329323330 4 4 4 4
6 MAX
MIN
MAX
MIN
Chapter632
Example
Four-cardbridge/whist/heartshand,Maxtoplayfirst
8
92
6 668766766766767
4293429342343430
6
4
8
92
66876676676677
29329323330 4 4 4 4
6
6
4
8
92
6687667667
29329323
7
3
6
4667
3 4 4 46
6
7
3 4
−0.5
−0.5
MAX
MIN
MAX
MIN
MAX
MIN
Chapter633
Commonsenseexample
RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:
taketheleftforkandyou’llfindamoundofjewels;taketherightforkandyou’llberunoverbyabus.
Chapter634
Commonsenseexample
RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:
taketheleftforkandyou’llfindamoundofjewels;taketherightforkandyou’llberunoverbyabus.
RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:
taketheleftforkandyou’llberunoverbyabus;taketherightforkandyou’llfindamoundofjewels.
Chapter635
Commonsenseexample
RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:
taketheleftforkandyou’llfindamoundofjewels;taketherightforkandyou’llberunoverbyabus.
RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:
taketheleftforkandyou’llberunoverbyabus;taketherightforkandyou’llfindamoundofjewels.
RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:
guesscorrectlyandyou’llfindamoundofjewels;guessincorrectlyandyou’llberunoverbyabus.
Chapter636
Properanalysis
*IntuitionthatthevalueofanactionistheaverageofitsvaluesinallactualstatesisWRONG
Withpartialobservability,valueofanactiondependsontheinformationstateorbeliefstatetheagentisin
Cangenerateandsearchatreeofinformationstates
Leadstorationalbehaviorssuchas♦Actingtoobtaininformation♦Signallingtoone’spartner♦Actingrandomlytominimizeinformationdisclosure
Chapter637
Summary
Gamesarefuntoworkon!(anddangerous)
TheyillustrateseveralimportantpointsaboutAI
♦perfectionisunattainable⇒mustapproximate
♦goodideatothinkaboutwhattothinkabout
♦uncertaintyconstrainstheassignmentofvaluestostates
♦optimaldecisionsdependoninformationstate,notrealstate
GamesaretoAIasgrandprixracingistoautomobiledesign
Chapter638