random sampling via markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/dmsskijima.pdfrandom sampling...

63
Random sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋 秀治) 1 , Tomomi Matsui (松井 知己) 2 1 The University of Tokyo (東京大学 大学院情報理工学系研究科 数理情報学専攻) 2 Chuo University (中央大学 理工学部 情報工学科) マルコフ連鎖を用いたランダムサンプリング法 as a technique for random sampling

Upload: others

Post on 06-Jul-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

Random sampling via Markov chain

Sep. 25, 2006

*Shuji Kijima (来嶋 秀治)1, Tomomi Matsui (松井 知己)2

1The University of Tokyo(東京大学 大学院情報理工学系研究科 数理情報学専攻)

2Chuo University

(中央大学 理工学部 情報工学科)

マルコフ連鎖を用いたランダムサンプリング法

as a technique for random sampling

Page 2: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

-7. Sampling via Markov chain

Markov chain Monte Carlo (MCMC)

<narrowly defined as>Monte Carlo method with sampling via Markov chain

<comprehensively intend>sampling via Markov chain

In this talk, `sampling’ = `random sampling’

key word

Page 3: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

3Markov chain

0.6

s1

s2 s3

0.2

0.6

0.4

0.2

0.2

0.4

0.3

0.1

• Markov chain M (ergodic) defined by

state space: Ω={s1,s2,s3} (finite)

transition: transition probability matrix P

Pij = Pr( i → j ) =: P( i, j)

figure shows an ex. of prob. trans. diagram of Markov chain

Page 4: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

4Markov chain

0.6

s1

s2 s3

0.2

0.6

0.4

0.2

0.2

0.4

0.3

0.1

underlying graph

• Markov chain M (ergodic)

state space: Ω={s1,s2,s3} (finite)

transition: transition probability matrix P

Pij = Pr( i → j ) =: P( i, j)

0.2 0.4 0.40.2 0.2 0.60.1 0.3 0.6

P =

next state

curr

ent

stat

e s1 s2 s3

s1s2

s3

Page 5: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

5Stationary distribution

0.6

s1

s2 s3

0.2

0.6

0.4

0.2

0.2

0.4

0.3

0.1

underlying graph

• Markov chian M (ergodic)

state space: Ω={s1,s2,s3} (finite)

transition: transition probability matrix P

Pij = Pr( i → j ) =: P( i, j)

stationary distribution: π

⇔ π P = π, |π| = 1, π ≥ 0

0.2 0.4 0.40.2 0.2 0.60.1 0.3 0.6

P =

π = (1/7, 2/7, 4/7)

Page 6: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

6

x

x

x

x

x

・・・

Basic idea of “sampling via Markov chain”

outputs after many transitions according to asymptotically stationary distribution

1. Design a Markov chain whose stat. dist = aiming dist.2. Generate a sample from stat. dist. after many tran.s.

state space

Page 7: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

7Applications of MCMC

MCMC works powerfully for sampling–hard objects

Easy to design a sampler for an objective distribution

cf) simulated annealing

Many applications

• Contingency table

• Ising model

• Permanent

• Spanning tree

• Numerical integration (Monte Carlo integration)

• Optimization

• counting-hard• huge state space

Pr[global opt.] = 1 Pr[other sol.s] = 0

Page 8: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

8Ex. 1. Contingency table

matrix of non-negative integers

satisfies (given) marginal sums

1218

5 4 3 7 5 6 30

Problem

Given: marginal sums

Output: a contingency table u.a.r.

as an ex. of sampling via Markov chainin this talk, deal with

Page 9: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

9Ex. 1. Contingency table

1218

5 4 3 7 5 6 30

5 4 3 0 0 0 120 0 0 7 5 6 185 4 3 7 5 6 30

0 0 0 1 5 6 125 4 3 6 0 0 185 4 3 7 5 6 30

table A

4 3 1 3 1 0 121 1 2 4 4 6 185 4 3 7 5 6 30

table B table C

Problem

Given: marginal sums

Output: a contingency table u.a.r.

matrix of non-negative integers

satisfies (given) marginal sums

Page 10: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

10Ex. 1. Contingency table

1218

5 4 3 7 5 6 30

counting # of tables satisfying given marginal sums

⇒ #P-complete (NP-hard), even when size of table is 2 × n

sampling via

Markov chain

Problem

Given: marginal sums

Output: a contingency table u.a.r.

matrix of non-negative integers

satisfies (given) marginal sums

[Dyer, Kannan, and Mount ‘97]

Page 11: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

11Previous works of contingency tables

1985, Diaconis and Effron, exact test with uniform sampler ,

1995, Diaconis and Saloff-Coste, approx. sampler for m* × n* table,

1997, Dyer, Kannan and Mount, #P-completeness,

2000, Dyer and Greenhill, approx. sampler for 2 × n table,

2002, Cryan et al., approx. sampler for m* × n table

2003, Kijima and Matsui, perfect sampler for 2 × n table

Open problem

Is there a poly-time (approx. or perfect) sampler for m × n table?

Page 12: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

-6: Design of Stationary Distribution

Basic idea of “sampling via Markov chain”1. Design a Markov chain whose stat. dist = aiming dist.2. Generate a sample from stat. dist. after many tran.s.

Page 13: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

13Design of Markov chain with aiming stat. dist.A Markov chain M is ergodic, iff

1. the state space is finite,

2. trans. matrix is irreducible, and

3. apperiodic.

ThemFor a positive function f and transition prob. matrix P,if detailed balance equations

hold, then the stationary distribution

limit dist. = stat. dist.

c is the normalizing constant

Metropolis-Hastings, Gibbs sampler, heat-bath chain, etc.

every pair is mutually reachable

Page 14: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

14Ex. 1. Markov chain for contingency tables [KM ‘06]

4 3 2 3 0 0 121 1 1 4 5 6 185 4 3 7 5 6 30

2 3 51 4 53 7 10

3 20 5

2 31 4

1 42 3

0 53 2

1. choose a consecutivepair of columns (j, j+1)u.a.r. (prob. 1/(n –1))

2. change the values of cells in (j, j+1)-th columnsu.a.r. on possible states

j j+1

1/4 1/4 1/41/4

+k –k–k +k+

4 possible states

(requirement on non-negativity)

=> preserve marginal sums

transition rule is defined as follows

Page 15: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

15Our Markov chain for contingency tables [KM ‘06]

4 3 2 3 0 0 121 1 1 4 5 6 185 4 3 7 5 6 30

4 3 0 5 0 0 121 1 3 2 5 6 185 4 3 7 5 6 30

X Y

Them

Our Markov chain is ergodic, and

the unique stat. dist. of the chain is uniform dist.

proof for the latter claim: detailed balance equationshold.

with the following ex. of a pair (x,y)

transition prob. from X to Y transition prob. from Y to X

since …

Page 16: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

16Our Markov chain for contingency tables [KM ‘06]

4 3 2 3 0 0 121 1 1 4 5 6 185 4 3 7 5 6 30

4 3 0 5 0 0 121 1 3 2 5 6 185 4 3 7 5 6 30

X Y

Them

Our Markov chain is ergodic, and

the unique stat. dist. of the chain is uniform dist.

proof for the latter claim: detailed balance equationshold.

choose a consecutive pair of indices u.a.r. (w.p. 1/(6-1))

Page 17: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

17Our Markov chain for contingency tables [KM ‘06]

4 3 2 3 0 0 121 1 1 4 5 6 185 4 3 7 5 6 30

4 3 0 5 0 0 121 1 3 2 5 6 185 4 3 7 5 6 30

X Y

Them

Our Markov chain is ergodic, and

the unique stat. dist. of the chain is uniform dist.

proof for the latter claim: detailed balance equations

3 20 5

2 31 4

1 42 3

0 53 2

on the condition (3,4) columns are chosen,there are common 4 possible states, since values in other columns are same.

hold.

Page 18: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

18

x

x

x

State Space

x

x

・・・

Basic idea of sampling via Markov chain

The output is ‘approximately’ according to the stat. dist.

Start from arbitrary initial state

Output a sample

Make several transitions

Page 19: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

-5: Convergence Speed of Markov chain

Basic idea of “sampling via Markov chain”1. Design a Markov chain whose stat. dist = aiming dist.2. Generate a sample from stat. dist. after many tran.s.

frontier of MCMC

Page 20: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

20“How many transitions do we need?”

If we have an approximate sampler,

• we have to estimate the mixing time, and

• bound the total variation distance.

If we have a perfect sampler,

• we can output a sample

exactly according to the stationary distribution.

• CFTP (Coupling From The Past) realizes a perfect sampler

We need not to decide the error rate.

No error

make the error sufficiently small

Page 21: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

21Mixing time of a Markov chain is defined as follows

Mixing time

μ, ν: dist. on Ω

Ergodic chain M (stat. space Ω, trans. matrix P, stat. dist. π)

Total variation distance

rapidly mixing if τ(ε) ≤ poly.(log Ω, ε–1)

Ω

prob.μ ν

Error

t.v.d. is proved sufficiently small in mixing time

Page 22: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

22“How many transitions do we need?”

If we have an approximate sampler,

• we have to estimate the mixing time, and

• bound the total variation distance.

If we have a perfect sampler,

• we can output a sample

exactly according to the stationary distribution.

• CFTP (Coupling From The Past) realizes a perfect sampler

We need not to decide the error rate.

NO ERROR

make the error sufficiently small

Page 23: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

-3. PERFECT SIMULATION

Propp and Wilson [1996]

The coupling from the past (CFTP)

- is an ingenious simulation of Markov chain,

- which realize perfect sampling.

sampling from EXACTLY limit distribution

INTERMEZZOreview of

Page 24: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

24Update function -- with an example

• An ergodic Markov chain MC

finite state space: s1, s2, s3 ;

Transition

s1

s2 s3

1/2

1/3

1/6

1/3

1/61/2

1/2

1/3

1/6

Page 25: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

25Update function -- with an example

s1 s2 s3

2 s2 s3 s2

3 s2 s1 s3

4 s1 s1 s3

5 s1 s2 s1

6 s2 s3 s3

1 s3 s1 s2

s1

s2 s3

1/2

1/3

1/6

1/3

1/61/2

1/2

1/3

1/6

λ

current state5

This table shows an update function

s2 s2

λ

s1

s3

s1

s3

t +1t time

e.g.,

This is an illustration of a transition.

• An ergodic Markov chain MC

finite state space: s1, s2, s3 ;

Transition

We consider to determine the next state with- a random number λ ∈ {1,…,6} (u.a.r.), and - an Update function.

Page 26: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

26CFTP Algorithm and Theorem.

1. Set T = –1; set λ: empty; 2. Generate λ[T]: random number;

Put λ := ( λ[T], λ[T+1],…,λ[–1] );3. Start a chain from every element in Ω at period T, run MC with λ to period 0.

a. if coalesce (∃y∈Ω, ∀x∈Ω, y = ΦT0(x, λ)) ⇒ return y;

b. otherwise, set T := T-1 ; go to step 2.;

CFTP Algorithm

When CFTP Algorithm terminates, the returned value realizes the random variable from stationary distribution, exactly.

CFTP Theorem

Markov chain MC:

Ω: finite state space

Φst(x, λ): transition rule

ergodic

In the following slides, I will illustrate the algorithm precisely.

consists of • 3 steps and• a stopping condition

Page 27: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

27Simulation from T = -1

s1

0–1–2–3–4–5–6 time

s2

s3

1. set T = –1; set λ: empty sequence; λ

Step 1 is an initializing step

Page 28: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

28Simulation from T = -1

s1

0–1–2–3–4–5–6 time

s2

s3

λ(–1) = 52. Generate λ[T]: random number;

Put λ := ( λ[T], λ[T+1],…,λ[–1] ); λ

Page 29: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

29Simulation from T = -1

s1

0–1–2–3–4–5–6 time

s2

s3

5

s1

s2

3. Start a chain from every element in Ωat period T, run MC with λ to period 0. λ

Page 30: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

30Simulation from T = -1

s1

0–1–2–3–4–5–6 time

s2

s3

5

s1

s2

3. Start a chain from every element in Ωat period T, run MC with λ to period 0.

a. if coalesce(∃y∈Ω, ∀x∈Ω, y = ΦT

0(x, λ)) ⇒ return y;

b. otherwise, set T := T – 1; go to step 2.;

λ

coalesce means a state at 0 is unique.

In this case, the states do not coalesce, thus…

Page 31: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

31Simulation from T = -2

0–1–2–3–4–5–6 time

5

s3

s2

s1

λ3. Start a chain from every element in Ω

at period T, run MC with λ to period 0.

a. if coalesce(∃y∈Ω, ∀x∈Ω, y = ΦT

0(x, λ)) ⇒ return y;

b. otherwise, set T := T – 1; go to step 2.;

Page 32: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

32Simulation from T = -2

0–1–2–3–4–5–6 time

5

s3

s2

s1

λ(–2) = 2 λ2. Generate λ[T]: random number;

Put λ := ( λ[T], λ[T+1],…,λ[–1] );

We will use “5” from -1 to 0 again,generated in the previous iteration.

Page 33: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

33

s1

s2

s1

s2

s3

s1

s2

Simulation from T = -2

0–1–2–3–4–5–6 time

5

s3

s2

s1

2

s2

s3

s1

s2

λ3. Start a chain from every element in Ω

at period T, run MC with λ to period 0.

a. if coalesce(∃y∈Ω, ∀x∈Ω, y = ΦT

0(x, λ)) ⇒ return y;

b. otherwise, set T := T – 1; go to step 2.;

Notetransitions from -1 to 0 = transitions in the previous iteration

We will use “5” from -1 to 0 again,generated in the previous iteration.

Page 34: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

34Simulation from T = -3

0–1–2–3–4–5–6 time

523

s1

s2

s3

s2

s3

s1

s2

s3

s2

s1

λ

a. if coalesce(∃y∈Ω, ∀x∈Ω, y = ΦT

0(x, λ)) ⇒ return y;

b. otherwise, set T := T – 1; go to step 2.;

Page 35: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

35Simulation from T = -4

0–1–2–3–4–5–6 time

5

s2 s2

23

s3

s1

s3

s2

s1

6

s2

s3

λ

Page 36: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

36Simulation from T = -4

0–1–2–3–4–5–6 time

5

s2 s2

23

s3

s1

s3

s2

s1

6

s2

s3

a. if coalesce(∃y∈Ω, ∀x∈Ω, y = ΦT

0(x, λ)) ⇒ return y;

s2

λ

output

Page 37: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

37CFTP Algorithm and Theorem.

1. Set T = –1; set λ: empty; 2. Generate λ[T] ,... , λ[T/2 – 1]: random number;

Put λ := ( λ[T],... , λ[T/2 – 1], λ[T/2],…,λ[–1] );3. Start a chain from every element in Ω at period T, run MC with λ to period 0.

a. if coalesce (∃y∈Ω, ∀x∈Ω, y = ΦT0(x, λ)) ⇒ return y;

b. otherwise, set T := T-1; go to step 2.;

CFTP Algorithm

When CFTP Algorithm terminates, the returned value realizes the random variable from stationary distribution, exactly.

CFTP Theorem

Markov chain MC:

Ω: finite state space

Φst(x, λ): transition rule

ergodic

what is the idea of CFTP?

Page 38: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

38Idea of CFTP (Coupling From the Past)

Suppose an ergodic chain from infinite past, imaginarily.

Present state (state at time 0) isEXACTLY according to the stat. dist.

What is the present state?

Guess from the recent random transitions.

⇒ Find the evidence of the present state.

obtained by considering random numbers and transitions with an update function.

Page 39: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

39Figure of CFTP

0–1–2–3–4–5–6 time

5

s2 s2

23

s3

s1

s3

s2

s1

6

s2

s3

λ

s1

s2

s3

1

s1

s2

s3

4

⋅ ⋅ ⋅ ⋅

s1

s2

s1

s3

s1

s3

s2 s2

s3

s1

s3

s2

s1

s2

s3

s2

coalescence

unique state at 0

Recent random numbers

Suppose a Markov chain starting at infinite pastThus the present state is according to the stationary distribution, exactly.

Then we can start chains at time -4 from all states with recent random numbers.

Fortunately, we obtain a unique state at time 0. we call this situation ‘coalescence’

Obtain the present state (time 0) of infinite transitions!!

Page 40: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

40CFTP Algorithm and Theorem.

1. Set T = –1; set λ: empty; 2. Generate λ[T] ,... , λ[T/2 – 1]: random number;

Put λ := ( λ[T],... , λ[T/2 – 1], λ[T/2],…,λ[–1] );3. Start a chain from every element in Ω at period T, run MC with λ to period 0.

a. if coalesce (∃y∈Ω, ∀x∈Ω, y = ΦT0(x, λ)) ⇒ return y;

b. otherwise, set T := T-1; go to step 2.;

CFTP Algorithm

When CFTP Algorithm terminates, the returned value realizes the random variable from stationary distribution, exactly.

CFTP Theorem

Markov chain MC:

Ω: finite state space

Φst(x, λ): transition rule

ergodic However it is hard to start …

We cannot apply this algorithm, directly

since we concern with huge state space

Page 41: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

-2: Perfect Sampler for two-rowed contingency tables

[KM ‘06]

Page 42: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

42Rem. Markov chain for contingency tables [KM ‘06]

4 3 2 3 0 0 121 1 1 4 5 6 185 4 3 7 5 6 30

2 3 51 4 53 7 10

3 20 5

2 31 4

1 42 3

0 53 2

1. choose a consecutivepair of columns (j, j+1)u.a.r. (prob. 1/(n –1))

2. change the values of cells in (j, j+1)-th columnsu.a.r. on possible states

j j+1

1/4 1/4 1/41/4

+k –k–k +k+

4 possible states

(requirement on non-negativity)

=> preserve marginal sums

Page 43: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

43Update function [KM ‘06]

4 3 2 3 0 0 121 1 1 4 5 6 185 4 3 7 5 6 30

3 20 5

2 31 4

1 42 3

0 53 2

j j+1

1/4 1/4 1/41/4

[0,1/4) [1/4,2/4) [2/4,3/4) [3/4,1)

⎣λ⎦

λ’:= λ – ⎣λ⎦

2 3 51 4 53 7 10

• j 列目(とj+1列目)を1/(n –1)の確率で選ぶ。

• j 列目と j+1列目に対して推

移可能な状態に等確率で推移する。

+k –k–k +k+

• generate random real λ ∈ [1, n)

• set j = ⎣λ⎦

• set

X (1, j) = max{X’ [1, j]} – ⎣θ λ’⎦

θ: #of possible states

λ’:= λ – ⎣λ⎦

Page 44: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

44Sampling algorithm (monotone CFTP)

1. Set T = –1; Set λ: empty sequence;2. Generate λ[T] ,... , λ[T/2 –1] ∈ [1, n): uniform

random number;put λ := (λ[T], ... , λ[T/2 –1], λ[T/2], …, λ[–1] );

3. Start chains from xU, xL at period T and simulate with λ until period 0;a. if coalesce on Y ⇒ return Y;b. otherwise, set T := 2T; go to 2;

Them.The algorithm returns a random vector EXACTLY according to a product form solution.

5 4 3 0 0 0 120 0 0 7 5 6 185 4 3 7 5 6 30

0 0 0 1 5 6 125 4 3 6 0 0 185 4 3 7 5 6 30

XU : N-W rule

XL : N-E rule

We simulate just 2 chains

Page 45: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

45Key points of the theorem is …

• Introduce a partial order on the state space.

• XU and XL are the max. and the min., respectively.

• Any transition keeps the partial order.

Claim

Coalescence from xU and xL ⇔ Coalescence from all states

Our Markov chain is a monotone Markov chain

[Propp and Wilson 1996]

Page 46: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

46Def. cumulative sum vector

4 3 1 3 1 0 121 1 2 4 4 6 185 4 3 7 5 6 30

i

cX(i)

1 2 3 4 5 60

6

0

12

piece-wise linear function Consider to represent a 2×n table X by a line, which is a piece-wise linear function of cumulative sum vector cX

defined by

bijection: X → cX

represented bye.g.,

Page 47: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

47transition (represented by a line)

i

cX(i)

1 2 3 4 5 60

6

0

124 3 1 3 1 0 121 1 2 4 4 6 185 4 3 7 5 6 30

4 3 0 4 1 0 121 1 3 3 4 6 185 4 3 7 5 6 30

e.g., λ = 3.1

Choose an index, and change the point only at the index.

in the line representation

Page 48: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

48Def. Partial order

4 3 3 2 0 0 121 1 0 5 5 6 185 4 3 7 5 6 30

i

cX(i)

1 2 3 4 5 60

6

0

12

4 1 2 3 2 0 121 3 1 4 3 6 185 4 3 7 5 6 30

X

Y

X ≥ Y⇔ cX (i) ≥ cY (i)

( ∀ i ∈{0,…, n})

It means that red line is upper than green.

X ≥ Y

Page 49: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

49Max, Min on poset

5 4 3 0 0 0 120 0 0 7 5 6 185 4 3 7 5 6 30

i

cX(i)

1 2 3 4 5 60

6

0

12

0 0 0 1 5 6 125 4 3 6 0 0 185 4 3 7 5 6 30

XU : N-W rule

XL : N-E rule

Lemma

XU ≥ ∀X ≥ XL

Every state is between red and green

Page 50: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

50Key lemma

i

cX(i)

1 2 3 4 5 60

6

0

12

If red is upper than green, then ...4 3 3 0 2 0 12

1 1 0 7 3 6 185 4 3 7 5 6 30

4 3 2 2 1 0 121 1 1 5 4 6 185 4 3 7 5 6 30

X ’

Y ’

LemmaAny transition keepspartial order i.e.,∀(X, Y) s.t. X ≥ Y, φ(X, λ) ≥ φ(Y, λ)

Page 51: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

51Key lemma

4 3 3 0 2 0 121 1 0 7 3 6 185 4 3 7 5 6 30

i

cX(i)

1 2 3 4 5 60

6

0

12

4 3 2 2 1 0 121 1 1 5 4 6 185 4 3 7 5 6 30

X ’

Y ’

LemmaAny transition keepspartial order i.e.,∀(X, Y) s.t. X ≥ Y, φ(X, λ) ≥ φ(Y, λ)

If red is upper than green, then never twist after a transition.

Page 52: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

52Key points of the theorem is …

• Introduce a partial order on the state space.

• XU and XL are the max. and the min., respectively.

• Any transition keeps the partial order.

Claim

Coalescence from xU and xL ⇔ Coalescence from all states

Our Markov chain is a monotone Markov chain

[Propp and Wilson 1996]

Page 53: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

53fig. of coalescence

i

fX(i)

1 2 3 4 5 60

6

0

125 4 3 0 0 0 120 0 0 7 5 6 185 4 3 7 5 6 30

0 0 0 1 5 6 125 4 3 6 0 0 185 4 3 7 5 6 30

4 3 1 3 1 0 121 1 2 4 4 6 185 4 3 7 5 6 30

every state is between

red line (= XU) and green line (= XL)

XU

XU

Page 54: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

54fig. of coalescence

i

fX(i)

1 2 3 4 5 60

6

0

125 4 0 3 0 0 120 0 3 4 5 6 185 4 3 7 5 6 30

0 0 0 1 5 6 125 4 3 6 0 0 185 4 3 7 5 6 30

4 3 0 4 1 0 121 1 3 3 4 6 185 4 3 7 5 6 30

any transition preserve part. order

=> every state is between red & green

Page 55: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

55fig. of coalescence

i

fX(i)

1 2 3 4 5 60

6

0

125 4 0 2 1 0 120 0 3 5 4 6 185 4 3 7 5 6 30

0 0 0 4 2 6 125 4 3 3 3 0 185 4 3 7 5 6 30

4 3 0 3 2 0 121 1 3 4 3 6 185 4 3 7 5 6 30

any transition preserve part. order

=> every state is between red & green

Page 56: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

56fig. of coalescence

i

fX(i)

1 2 3 4 5 60

6

0

123 3 2 1 2 1 122 1 1 6 3 5 185 4 3 7 5 6 30

1 2 2 3 2 2 124 2 1 4 3 4 185 4 3 7 5 6 30

3 3 1 2 2 1 122 1 2 5 3 5 185 4 3 7 5 6 30

any transition preserve part. order

=> every state is between red & green

Page 57: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

57fig. of coalescence

4 0 0 1 3 4 121 4 3 6 2 2 185 4 3 7 5 6 30

i

fX(i)

1 2 3 4 5 60

6

0

12

4 0 0 1 3 4 121 4 3 6 2 2 185 4 3 7 5 6 30

4 0 0 1 3 4 121 4 3 6 2 2 185 4 3 7 5 6 30

Claim

Coalescence from xU and xL

⇔ Coalescence from all states

Page 58: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

-1: Concluding Remark

Page 59: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

59Expected running time of our perfect sampler

• Expected running time of CFTP algorithm = O ( E [T*] )( T* : coalescence time )

• Coalescence time of monotone CFTP (Propp and Wilson ’96)E [T*] ≤ 2 τ (1 + ln D)(τ : mixing rate, D : the distance of max. and min. )

• Mixing rate of our chainτ = n2 (n – 1) ln (n⋅K), D ≤ (n⋅K) (by path coupling )

special distance

Conditioncolumn sum vector s satisfies

s1 ≥ s2 ≥ ⋅ ⋅ ⋅ ≥ sn

ClaimExpected coalescence time = O( n3 ln (n⋅K)).

Omit the proofSingle server model

n: # of rows

K: sum total in table

Page 60: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

60Discussion

monotone Markov chain

Ising model

• restoration of mono-chromatic pictures

• Potts model Hard-core model

tiling

2-rowed contingency tables

queueing network

Another perfect sampling algorithm

Rooted spanning tree

Improvement of memory space

Read once algorithm [Wilson 2000]

Page 61: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

61Reference

• O. Haeggstroem, “Finite Markov Chains and Algorithmic Application,”London Mathematical Society, Student Texts, 52, Cambridge University Press, 2002.

• 来嶋秀治, 松井知己, “完璧にサンプリングしよう!”オペレーションズ・リサーチ, 50 (2005),

第一話「遥かなる過去から」, 169--174 (no. 3), 第二話「天と地の狭間で」, 264--269 (no. 4),第三話「終りある未来」, 329--334 (no. 5).

http://www.simplex.t.u-tokyo.ac.jp/~kijima/

(来嶋のHPの“資料”からダウンロード可能)

Page 62: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

62Future works

• To apply the monotone CFTP to sampling-hard objects

⇒ Design a Markov chain

⇒ Design a perfect sampler

⇒ Estimate the coalescence time

• m × n contingency tables

• New algorithm for Perfect Sampling

Page 63: Random sampling via Markov chaintcs.inf.kyushu-u.ac.jp/~kijima/slides/DMSSkijima.pdfRandom sampling via Markov chain Sep. 25, 2006 *Shuji Kijima (来嶋秀治)1, Tomomi Matsui (松井知己)2

0. The end

Thank you for the attention.

— all of your views coalesce.