computer go: an ai oriented survey artificial intelligence, volume 132, issue 1, october 2001, pages...

Computer Go: An Computer Go: An AI oriented AI oriented

surveysurveyArtificial Intelligence, Volume 132, Issue 1,Artificial Intelligence, Volume 132, Issue 1,

October 2001, Pages 39-103October 2001, Pages 39-103Bruno Bouzy , Tristan Cazenave

2004.11.112004.11.11劉思源劉思源

OutlineOutline1.1. Introduction Introduction

(Go)(Go) 2.2. Other gamesOther games 3.3. ResultsResults 4.4. EvaluationEvaluation 5.5. Move Move

generationgeneration 6. Tree search7. Optimization

8. Combinatorial game theory

9. Automatic generation of knowledge

10. Monte Carlo Go11. Mathematical

morphology12. Cognitive science13. Conclusion

Monte Carlo Go Monte Carlo methods are derived from statistical physics.

The mechanisms used to find extrema are fundamental in The mechanisms used to find extrema are fundamental in classical physics, in relativistic physics, and in classical physics, in relativistic physics, and in quantumquantum physicsphysics, as well., as well.

蒙地卡羅方法主要來自於統計物理學，這個技巧是在典型物理、相對蒙地卡羅方法主要來自於統計物理學，這個技巧是在典型物理、相對物理、量子物理中用來找物理、量子物理中用來找 extrema extrema 的重要法則的重要法則 For example, at high temperatures, a metal is liquid, and For example, at high temperatures, a metal is liquid, and

its atoms move randomly, but when the metal is cooled, the its atoms move randomly, but when the metal is cooled, the atoms put themselves into a configuration that minimizes atoms put themselves into a configuration that minimizes energy—a crystalline structure. energy—a crystalline structure. This process is called annealing. The longer the cooling, This process is called annealing. The longer the cooling, the closer to the minimum of energy the cooled structure the closer to the minimum of energy the cooled structure is.is.

舉例來說，在高溫下金屬是液態的，他的原子會隨機的移動，但是當舉例來說，在高溫下金屬是液態的，他的原子會隨機的移動，但是當金屬冷卻了，它的原子自動變成了一種最小能量的結構金屬冷卻了，它的原子自動變成了一種最小能量的結構 – – 結晶的結構，結晶的結構，這個過程稱之為這個過程稱之為 annealing (annealing ( 退火、鍛鍊退火、鍛鍊 ))

Monte Carlo Go

The evolution of the system is approximately assessed by The evolution of the system is approximately assessed by choosing a move with a probability that depends on the choosing a move with a probability that depends on the growth in activity resulting from the move.growth in activity resulting from the move.

系統的發展由選擇一個機率的系統的發展由選擇一個機率的 move move 來逼近估計值，這個機率是來逼近估計值，這個機率是由一個由一個 move move 的活動結果的成長而定的活動結果的成長而定 The probability The probability p(E) p(E) that a particle has the energy that a particle has the energy E E at a at a

temperature temperature T is p(E) = exp(− T is p(E) = exp(− △△ E/kT ) , E/kT ) , k being the Boltzmann constant. k being the Boltzmann constant.

Monte Carlo Go Simulated annealing

There are N ! different paths between the N cities. Simulated annealing finds a solution, close to the optimal solution, in a polynomial time

The Gobble programThe Gobble program uses simulated annealing to find an approximation of the best move on a 9 × 9 board

Mathematical morphology

This section highlights the link between image processing and Computer Go

The size of the board The size of the board ((19 × 1919 × 19) ) on which the game is played, is on which the game is played, is much smallermuch smaller than the size of than the size of the pictures (more than 1000 × the pictures (more than 1000 × 1000) processed in the pattern 1000) processed in the pattern recognition domain recognition domain Therefore, the complexity of Therefore, the complexity of the game of Go is situated far the game of Go is situated far below the complexity of image below the complexity of image processing processing

Mathematical morphology

This model is the ancestor of the This model is the ancestor of the refinements used today in Go programs. refinements used today in Go programs.

For example, the Indigo program makes For example, the Indigo program makes explicit use of mathematical morphology explicit use of mathematical morphology for territory-, and influence-modeling.for territory-, and influence-modeling.

GnuGo also uses this model. GnuGo also uses this model. Morphological dilation and Morphological dilation and

morphological erosion morphological erosion

Opening and Closing Opening and Closing operatorsoperators

Let Let AA be a set of elements, and let be a set of elements, and let D(A)D(A) be the morphological be the morphological dilation of dilation of A A — composed of — composed of AA, plus the neighboring elements , plus the neighboring elements of of AA. .

E(A)E(A) is the morphological erosion of is the morphological erosion of AA. . It is composed of It is composed of AA, minus the elements which are neighbors , minus the elements which are neighbors of the complement of of the complement of AA. .

ExtBound(ExtBound(AA)) is the morphological external boundary of is the morphological external boundary of AA Given by ExtBoundGiven by ExtBound(A)(A)= = D(A)D(A)−−AA. .

IntBound(IntBound(AA)) is the morphological internal boundary of is the morphological internal boundary of AA Given by IntBound( Given by IntBound(AA) = ) = A A − − E(A)E(A)

Closing(Closing(AA)) is the morphological closing of is the morphological closing of AA ; ;where Closingwhere Closing(A)(A)= = E(D(A))E(D(A))

Opening(Opening(AA)) is the morphological opening of is the morphological opening of AA; ; given by Openinggiven by Opening(A) (A) = = D(E(A))D(E(A))

The opening and closing operators are very helpful in image The opening and closing operators are very helpful in image processing processing

““ territories”, and territories”, and “influence” “influence”

We start by assigning values of We start by assigning values of +64+64 (respectively −64) to (respectively −64) to blackblack (respectively white) intersections, and 0 elsewhere (respectively white) intersections, and 0 elsewhere

The The DD operatoroperator now consists of now consists of adding to the absolute value of adding to the absolute value of an intersection of one color, the number of neighboring an intersection of one color, the number of neighboring intersections of this color, provided that all the neighboring intersections of this color, provided that all the neighboring intersections are empty, or of that colorintersections are empty, or of that color

For an empty intersection, which has neighboring For an empty intersection, which has neighboring intersections of the same color, intersections of the same color, D D also adds the number of also adds the number of neighboring intersections of this color to the absolute value of neighboring intersections of this color to the absolute value of the intersection the intersection

Similarly, the Similarly, the EE operatoroperator now consists of now consists of subtracting from the subtracting from the absolute value of an intersection of one color, the number of absolute value of an intersection of one color, the number of neighboring intersections whose value is either zero, or whose neighboring intersections whose value is either zero, or whose value corresponds to the opposite color of the intersectionvalue corresponds to the opposite color of the intersection

Once these refinements are added, we use the operators Once these refinements are added, we use the operators XX = = E E ∗ ∗ E E ∗ ··· ∗ ∗ ··· ∗ E E ∗ ∗ D D ∗ ∗ D D ∗ ···∗∗ ···∗DD, and , and YY = = DD∗∗DD∗···∗∗···∗D D where where E E is is composed ‘composed ‘ee’ times, and ’ times, and DD, ‘, ‘dd’ times ’ times

Fig. 35 shows the result of Fig. 35 shows the result of Y Y , with , with d d = 5, = 5, and and e e = 0= 0 shows what Go players call shows what Go players call

“influence”“influence”

Fig. 36 shows the result of Fig. 36 shows the result of XX, with , with d d = 5 = 5 and, and, e e = 21= 21 shows the “territories” quite shows the “territories” quite

accuratelyaccurately

D D operator & operator & EE operator operator Bouzy has shown that Bouzy has shown that e e = = d d ∗ ∗ (d (d − 1− 1))+ +

1, in which ‘1, in which ‘d d ’ is a scaling factor. ’ is a scaling factor.

The bigger ‘The bigger ‘d d ’ is, the larger the scale ’ is, the larger the scale of recognized territories is. of recognized territories is.

This technique was part of the EF of This technique was part of the EF of the Indigo program , and now has been the Indigo program , and now has been integrated in the integrated in the GnuGoGnuGo program program

Cognitive science The game of Go is appropriate for carrying out

cognitive science studies. It is quite justifiable to choose the game of Go as a domain to perform cognitive experiments

圍棋遊戲很適合用在認知科學的研究，選擇圍棋遊戲來當作實行認知實驗的領域是很有道理的。 Related works

Measuring response timesAnalyzing verbal reports

conclude : natural language plays an important role in playing Go

Measuring response times :

Chase and Simon : : A “chunk” can be defined as a cluster of information

Chess experts build actual Chess positions more Chess experts build actual Chess positions more quickly, they seem to have a greater memory capacity quickly, they seem to have a greater memory capacity than non-experts than non-experts

However, with random positions, the authors observed However, with random positions, the authors observed that the experts’ and the non-experts’ performances that the experts’ and the non-experts’ performances were equal were equal

The explanation given by the authors was that the The explanation given by the authors was that the number of memorized chunks is equal for experts and number of memorized chunks is equal for experts and non-experts, but that experts memorize more non-experts, but that experts memorize more specialized chunksspecialized chunks ex : ex : 圍棋覆盤的能力圍棋覆盤的能力

圍棋高手與高手對奕後，可以立刻覆盤，且回想的速度很快。但是當高手與棋力很弱的玩家對奕後，覆盤時，回想的速度非常慢，非常困難。 Analyzing verbal reports :

Saito and Yoshikawa showed that human players use natural language terms to play their games.

Conclusion Summary

Future work

為了使限制 tree search ，必須要結合一個非常好的、可以找出最佳棋步的 Heuristics function.

如何執行為了連接、切斷或某個目標的如何執行為了連接、切斷或某個目標的 tree searchtree search ，，這個問題仍然未被解決這個問題仍然未被解決遊戲程式的評估函數或許用平行處理的方式是可行的。遊戲程式的評估函數或許用平行處理的方式是可行的。

Conclusion Today, young professional players still give Today, young professional players still give 9 handicap9 handicap stones to the stones to the

best programs, and players who are used to playing against programs, best programs, and players who are used to playing against programs, are able to give as many as are able to give as many as 29 handicap29 handicap stones to these programs. stones to these programs.

During the first few games—when its human opponent confronts the During the first few games—when its human opponent confronts the strengths of the computer—the program may give the illusion of being strengths of the computer—the program may give the illusion of being stronger than it actually is, and it plays at its “stronger than it actually is, and it plays at its “highhigh” level ” level

Some games later, the human opponent discovers the weaknesses of Some games later, the human opponent discovers the weaknesses of the computer, and still later, the human opponent identifies almost the computer, and still later, the human opponent identifies almost all all the weaknesses of the computerthe weaknesses of the computer, whose level generally drops to its , whose level generally drops to its ““lowlow” level ” level

Nowadays, the “high” level of the programs may be assessed at Nowadays, the “high” level of the programs may be assessed at 5 5 th th kyukyu—this corresponds to an average player in a Go club. —this corresponds to an average player in a Go club.

However, their “low” level ranks at However, their “low” level ranks at 15 15 th kyu, namely a beginner level th kyu, namely a beginner level

As long as this gap, between the low and the high levels, is not As long as this gap, between the low and the high levels, is not reduced, it is reduced, it is risky to make anyrisky to make any prediction about the evolution of the prediction about the evolution of the level of Go programslevel of Go programs

Conclusion 現今年輕的專業玩家可以讓最強的程式九子，而經常與程式下棋的玩現今年輕的專業玩家可以讓最強的程式九子，而經常與程式下棋的玩家，則可以讓家，則可以讓 2929 子之多，當人類玩家與電腦程式下棋，剛開始時，子之多，當人類玩家與電腦程式下棋，剛開始時，程式會有一種假象：讓人類以為它的棋力很強，而讓它在「高」層次程式會有一種假象：讓人類以為它的棋力很強，而讓它在「高」層次下對奕，但下久了之後，人類玩家開始發現程式的弱點，漸漸的，當下對奕，但下久了之後，人類玩家開始發現程式的弱點，漸漸的，當人類完全掌握程式的所有弱點，對奕的層次就會自然的降到「低」層人類完全掌握程式的所有弱點，對奕的層次就會自然的降到「低」層次。次。至今，高層次的圍棋程式，可以被評比為至今，高層次的圍棋程式，可以被評比為 55 級的程度，這相當於所有級的程度，這相當於所有會下圍棋的人的平均等級，而低層次的程式，通常只有會下圍棋的人的平均等級，而低層次的程式，通常只有 1515 級的程度，級的程度，相當於圍棋的初學者階段。相當於圍棋的初學者階段。只要這個落差沒有縮減，我們就可以大膽的估計將來圍棋程式的發展只要這個落差沒有縮減，我們就可以大膽的估計將來圍棋程式的發展程度程度 (( 進步空間進步空間 )) 仍然是很大的。仍然是很大的。

computer go: an ai oriented survey artificial intelligence, volume 132, issue 1, october 2001, pages...

Documents