monte-carlo search algorithms

Post on 30-Dec-2015

28 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Monte-Carlo Search Algorithms. Daniel Bjorge and John Schaeffer. Problem. Where it needs to be recognized. Important stuff. Solution. Best early prediction?. Which is faster?. Search Algorithms. Best β parameter?. Comparing Search Algorithms. Option 1: Computer Go. - PowerPoint PPT Presentation

TRANSCRIPT

Monte-Carlo Search AlgorithmsDaniel Bjorge and John Schaeffer

Problem

Important stuff

Where it needs to

be recognize

d

Solution

Search Algorithms

Which isfaster?

Best early prediction?

Best β parameter?

ComparingSearch Algorithms

Option 1: Computer Go

State expansion

SlowTermination detection

Slower

Heuristics

Lassú

Option 2: Artificial Trees

State expansion

FastTermination detection

Faster

Heuristics

Fast

010110

010110

010110

010110

010110

010110

010110

010110

010110

Existing FrameworksP-Game

Kocsis2006Artificial

Fast

Simple

FuegoEnzenberger and

Müller

2009

Generic

FastScalabl

e

Existing FrameworksP-Game

Kocsis2006

Small Scale

FuegoEnzenberger and

Müller

2009

Complex

Only Go Implement

ed

Gomba Search Framework(Go Multi-Armed Bandit Analysis)

Fast

Scalable

Multiple Metrics

Tweakable Trees

Swappable Search Strategies

Finom a Paprikásban!

Eager Game Tree Generation

Tree Generat

orMemory

Search Algorith

m

(Small trees)

Eager Game Tree Generation

Tree Generato

r Memory

(Big trees)

Lazy Game Tree Generation

Tree Generat

or

Memory

Search Algorith

m

(Any trees)

Minimax Values

7 0 -4 9 6 3 4 -91

Maximizing

Minimizing

Minimax Values

4

7 9 4

7 0 1 -4 9 6 3 4 -9

Maximizing

Minimizing

4

7 9 4

7 0 1 -4 9 6 3 4 -9

Minimax Values

O (bd /2 )Maximizing

Minimizing

Go:

Downward Minimax Evaluation

4

7 9 4

7 0 1 4 9 6 3 4 0

15

Example

5 0

0Maximizing

Minimizing

Difficulty

Minimax

Optimal

White to play

Easier

15

Quantified Difficulty

5 0

0Maximizing

Minimizing

Application to Search Algorithms

Many-Armed Bandit Solutions

10837...

0100...

1419-27-13...

1222...

-12-4-2-8…

UCTUpper Confidence Bound for Trees

Opti

mal arm

choic

e r

ate

(Kocsis and Szepesvári, 2006)

UCT-FPUUCT with First Play Urgency

Opti

mal arm

choic

e r

ate

(Hartland et al., 2006)

Infinitely-Armed Bandit Solutions

10837...

-12-4-2-8…

Infinite-Armed Bandit SolutionsAdapting to Trees

…for treesK-

Failure

UCT-V ()UCB-V ()

UCT-V () AIRUCB-V () AIR

(Wang, Audibert, and Munos, 2006)

K-Failure

Random

UCT-FPU

1-Failure

3-Failure

10-Failure

Opti

mal arm

choic

es

(out

of

2000)

UCT-V ()

… … … … … … … … …

UCT-V () with AIR

… … … … … … … … …

UCT-V () with AIR: Gomba

Opti

mal arm

choic

es

(out

of

1000)

In Fuego

UCT UCT-V UCT-V (∞), β=.1

UCT-V (∞), β=.2

62

64

66

68

70

72

74

Fuego vs GnuGo, 40s moves

Upper 95% Con-fidence BoundLower 95% Con-fidence Bound

Search Algorithm

Win

% v

s G

nu

Go

Wrapping Up

Gomba• Fast• Scalable• Reasonably accurate• Simple• InterchangeableInfinite Bandit-Based Solutions• Improvement

Questions?

Köszönjük szépen!

Advisors• Kocsis Levente• Sárközy Gábor• Stanley Selkow

SZTAKI Info Lab• Benczúr András• Recski Gábor• Zséder Attila• Kornai András• Erdélyi Miklós

Fuego Development Team

top related