monte-carlo search algorithms

Monte-Carlo Search AlgorithmsDaniel Bjorge and John Schaeffer

Problem

Important stuff

Where it needs to

be recognize

Solution

Search Algorithms

Which isfaster?

Best early prediction?

Best β parameter?

ComparingSearch Algorithms

Option 1: Computer Go

State expansion

SlowTermination detection

Slower

Heuristics

Lassú

Option 2: Artificial Trees

State expansion

FastTermination detection

Faster

Heuristics

010110

Existing FrameworksP-Game

Kocsis2006Artificial

Simple

FuegoEnzenberger and

Müller

Generic

FastScalabl

Existing FrameworksP-Game

Kocsis2006

Small Scale

FuegoEnzenberger and

Müller

Complex

Only Go Implement

Gomba Search Framework(Go Multi-Armed Bandit Analysis)

Scalable

Multiple Metrics

Tweakable Trees

Swappable Search Strategies

Finom a Paprikásban!

Eager Game Tree Generation

Tree Generat

orMemory

Search Algorith

(Small trees)

Eager Game Tree Generation

Tree Generato

r Memory

(Big trees)

Lazy Game Tree Generation

Tree Generat

Memory

Search Algorith

(Any trees)

Minimax Values

7 0 -4 9 6 3 4 -91

Maximizing

Minimizing

Minimax Values

7 0 1 -4 9 6 3 4 -9

Maximizing

Minimizing

7 0 1 -4 9 6 3 4 -9

Minimax Values

O (bd /2 )Maximizing

Minimizing

Downward Minimax Evaluation

7 0 1 4 9 6 3 4 0

Example

0Maximizing

Minimizing

Difficulty

Minimax

Optimal

White to play

Easier

Quantified Difficulty

0Maximizing

Minimizing

Application to Search Algorithms

Many-Armed Bandit Solutions

10837...

0100...

1419-27-13...

1222...

-12-4-2-8…

UCTUpper Confidence Bound for Trees

mal arm

(Kocsis and Szepesvári, 2006)

UCT-FPUUCT with First Play Urgency

mal arm

(Hartland et al., 2006)

Infinitely-Armed Bandit Solutions

10837...

-12-4-2-8…

Infinite-Armed Bandit SolutionsAdapting to Trees

…for treesK-

Failure

UCT-V ()UCB-V ()

UCT-V () AIRUCB-V () AIR

(Wang, Audibert, and Munos, 2006)

K-Failure

Random

UCT-FPU

1-Failure

3-Failure

10-Failure

mal arm

UCT-V ()

… … … … … … … … …

UCT-V () with AIR

… … … … … … … … …

UCT-V () with AIR: Gomba

mal arm

In Fuego

UCT UCT-V UCT-V (∞), β=.1

UCT-V (∞), β=.2

Fuego vs GnuGo, 40s moves

Upper 95% Con-fidence BoundLower 95% Con-fidence Bound

Search Algorithm

Wrapping Up

Gomba• Fast• Scalable• Reasonably accurate• Simple• InterchangeableInfinite Bandit-Based Solutions• Improvement

Questions?

Köszönjük szépen!

Advisors• Kocsis Levente• Sárközy Gábor• Stanley Selkow

SZTAKI Info Lab• Benczúr András• Recski Gábor• Zséder Attila• Kornai András• Erdélyi Miklós

Fuego Development Team

monte-carlo search algorithms

Documents

cs 771 artificial...

improved estimation and uncertainty quanti cation...

engineering motif search for large graphs · pdf filetight...

search algorithms for substructures in generalized...

data structure and algorithms - search

performance of monte carlo tree search algorithms when...

monte carlo methods for web search

局所探索によるＣＳＰの解法 (local search...

monte-carlo tree search for multi-player games - dke...

ai-2 탐색과 최적화-i -...

data structures and algorithms 10€¦ · chapter 10 search...

algorithms: linear and binary search

acceptance rates (1) (black)...

instytut badań systemowych · carlo (ang. monte carlo tree...

large scale monte carlo tree search on...

tree-based search focused on red-black tree and trie and...

search algorithms for substructures in generalized...

exploring monte carlo tree search for join order...

application of monte carlo tree search in a fighting game ai...

07 (algorithms) graph - clickseo...