07 approximate inference in bn

Post on 10-May-2015

1.155 Views

Category:

Education

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Bayesian Networks

Unit 7 Approximate Inference in Bayesian Networks

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Wang, Yuan-Kai, 王元凱ykwang@mails.fju.edu.tw

http://www.ykwang.tw

Department of Electrical Engineering, Fu Jen Univ.輔仁大學電機工程系

2006~2011

Reference this document as: Wang, Yuan-Kai, “Approximate Inference in Bayesian Networks," Lecture Notes of Wang, Yuan-Kai, Fu Jen University, Taiwan, 2011.

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 2

Goal of This Unit• P(X|e) inference for Bayesian networks• Why approximate inference

– Exact inference is too slow because of exponential complexity

• Using approximate approaches– Sampling methods

• Likelihood weighting sampling• Markov Chain Monte Carlo sampling

– Loopy belief propagation– Variational method

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p.

Related Units• Background

– Probabilistic graphical model– Exact inference in BN

• Next units– Probabilistic inference over time

3

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 4

Self-Study References• Chapter 14, Artificial Intelligence-a modern

approach, 2nd, by S. Russel & P. Norvig, Prentice Hall, 2003.

• Inference in Bayesian networks, B. D’Ambrosio, AI Magazine, 1999.

• Probabilistic Inference in graphical models, M. I. Jordan & Y. Weiss.

• An introduction to MCMC for machine learning. Andrieu, C., De Freitas, J., Doucet, A., & Jordan, M. I., Machine Learning, vol. 50, pp.5-43, 2003.

• Computational Statistics Handbook with Matlab, W. L. Martinez and A. R. Martinez, Chapman & Hall/CRC, 2002– Chapter 3 Sampling Concepts– Chapter 4 Generating Random Variables

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 5

Structure of Related Lecture Notes

PGM Representation

Inference

Problem

Learning

Data

Unit 5 : BNUnit 9 : Hybrid BNUnits 10~15: Naïve Bayes, MRF,

HMM, DBN,Kalman filter

Unit 6: Exact inferenceUnit 7: Approximate inferenceUnit 8: Temporal inference

Units 16~ : MLE, EM

StructureLearning

ParameterLearning

B E

A

J M

P(A|B,E)P(J|A)P(M|A)

P(B)P(E)

Query

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p.

Contents

1. Sampling .......................................................... 112. Random Number Generator .......................... 203. Stochastic Simulation ……............................. 704. Markov Chain Monte Carlo .......................... 1135. Loopy Belief Propagation …………………. 1456. Variational Methods ………………………... 1467. Implementation …………………………….. 1478. Summary ……………………………………. 1489. References …………………………………… 151

6

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 7

4 Steps of Inference• Step 1: Bayesian theorem

• Step 2: Marginalization

• Step 3: Conditional independence

• Step 4: Product sum computation (Enumeration)– Exact inference–Approximate inference

),()(

),()|( eEXPeEP

eEXPeEXP

Hh

hHeEXP ),,(

Hh ni

ii XPaXP~1

))(|(

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 8

Five Types of Queries in Inference• For a probabilistic graphical model G• Given a set of evidence E=e• Query the PGM with

–P(e) : Likelihood query–arg max P(e) :

Maximum likelihood query–P(X|e) : Posterior belief query–arg maxx P(X=x|e) : (Single query variable)

Maximum a posterior (MAP) query–arg maxx1…xk

P(X1=x1, …, Xk=xk|e) :Most probable explanation (MPE) query

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 9

Approximate Inference v.s. Exact Inference

• Exact inference: P(X|E) = 0.71828– Get exact probability value– Using the inference steps derived by

probabilistic formula– Need exponential time complexity

• Approximate inference: P(X|E) 0.71– Get approximate probability value– Using sampling theorem– Need only polynomial time complexity,

fast computation

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 10

Why Approximate Inference• Large treewidth

– Large, highly connected graphical models– Treewidth may be large (>40) in sparse

networks • In many applications, approximation are

sufficient– Example: P(X = x|e) = 0.3183098861– Maybe P(X = x|e) 0.3 is a good enough

approximation– e.g., we take action only if P(X=x|e) > 0.5

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 11

1. Sampling

• 1.1 What Is Sampling• 1.2 Sampling for Inference

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 12

Basic Idea of Sampling• Why sampling

– Estimate some values by random number generation

1. Sampling– Random number generating– Draw N samples from a known distribution P– Generate N random numbers from a known

distribution S2. Estimation

– Compute an approximate probability , which approximates the real posterior probability P(X|E)

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 13

1.1 What Is Sampling• A very simple example with a random

variable : coin toss– Tossing the coin, get head or tail– It is a Boolean R.V.

• coin = head or tail– If it is unbiased coin, head and tail have

equal probability• A prior probability distribution

P(Coin) = <0.5, 0.5> • Uniform distribution

–Assume we have a coin but we do not know it is unbiased

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 14

Sampling of Coin Toss• Sampling in this example

= flipping the coin many times N– e.g., N=1000 times– One flipping get one sample– Ideally, 500 heads, 500 tails

• P(head) = 500/1000=0.5P(tail) = 500/1000=0.5

– Practically, 5001 heads, 499 tails• P(head) = 501/1000=0.501

P(tail) = 499/1000=0.499• After the sampling,

– We can estimate probability distribution– Check if it is biased

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 15

Sampling & Estimation (Math)• For a Boolean random variable X

– P(X) is prior distribution= <P(x), P(x)>

– Using a sampling algorithm to generate Nsamples

– Say N(x) is the number of samples that x is true, N(x) x is false

)(ˆ)( ),(ˆ)( xPN

xNxPN

xN

)()(lim ),()(lim xPN

xNxPN

xNNN

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 16

1.2 Sampling for Inference• Given a Bayesian network G including

(X1, …, Xn)– We get a joint probability distribution

P(X1, …, Xn) = P(Xi|Pa(Xi))• For a query P(X|E=e)

– P(X|e) = P(Xi | Parent(Xi)) – It is hard to compute

• Need exponential time in number of Xi– We will try to use sampling to compute it

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 17

Compute P(X|e) by Sampling• Sampling

– Generate N samples of P(X1, …, Xn) = P(Xi|Pa(Xi))

• Estimation– Use N samples to estimate

P(X,e) N(X,e)/N– Use N samples to estimate P(e) N(e)/N– Estimate P(X|e) by P(X,e) / P(e)

Explained in Sections 2,3,4

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 18

What Is Sampling Algorithm• The algorithm to

–Generate samples from a known probability distribution P

–Estimate the approximate probability P̂

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 19

Various Sampling Algorithms• Stochastic simulation

– Direct Sampling– Rejection sampling

• Reject samples disagreeing with evidence– Likelihood weighting

• Use evidence to weight samples• Markov chain Monte Carlo

(MCMC)– Sample from a stochastic process whose

stationary distribution is the true posterior

Section 3

Section 4

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 20

2. Random Number Generator

• Very important for sampling algorithm• Introduce basic concepts related to

sampling of Bayesian networks• Subsections

– 2.1 Univariate– 2.2 Multivariate

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 21

RNG In Programming Languages• Random number generator (RNG)

– C/C++: rand()– Java: random()– Matlab: rand()

• Why should we discuss it?– They generate random numbers with

uniform distribution– How to generate

• Gaussian, … • Multivariate, dependent random

variables • Non-closed-form distribution?

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 22

Generate a Random Number (1/2)• Examples in C

– int i = rand();– Return 0 ~ RAND_MAX (32767)– It generates integers

• Generate a random number between 1 and n (n<32767)– int i = 1 + ( rand() % n )– (rand() % n) returns a number between 0

and n - 1– Add 1 to make random number between 1

and n– It generates integers, but not real numbers

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 23

Generate a Random Number (2/2)• Ex: integer between 1 and 6–1 + ( rand() % 6)

• Ex: real number between 0 and 1–double i = rand() / RAND_MAX

• Exercise– Real number between 10 and 20

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 24

Generate Many Random Numbers Repeatedly

• Using loop for repeated generation– for (int i=0; i<1000; i++)

{ rand(); }– int i, j[1000];

for (i=0; i<1000; i++){ j[i] = 1 + rand() % 6; }

rand() generates a number uniformlyUniform distribution

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 25

Why Generate Random Numbers• Simulate random behavior• Make random decision• Estimate some values

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 26

Random Behavior/Decision (1/2) • Flip a coin for decision (Boolean)

– Fair: each face has equal probability – int coin_face;

if (rand() > RAND_MAX/2) coin_face = 1;

else coin_face = 0;– int coin_face;

coin_face = rand() % 2;

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 27

Random Behavior/Decision (2/2)• Random decision of multiple choices

– Discrete random variable• Ex: roll a die

–Fair: each face has equal probability• int die_face; //Random variable

die_face = rand() % 6;

Uniform distribution

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 28

Estimation• If we can simulate a random behavior• We can estimate some values

– First, we repeat the random behavior– Then we estimate the value

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 29

Example: The Coin Toss

• Flip the coin 1000 times to estimate the fairness of the coin– int coin_face; //Random variable

int frequency[2];for (i=0; i<1000; i++){ coin_face = rand() % 2

frequency[coin_face]++;}

Coinface

0 1

frequencyUniform distribution

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 30

Example : Area of Circle (Estimation)• int x, y; //Two random variables

int N=1000, NCircle=0, Area;for (i=0; i<N; i++){ x = rand() / RAND_MAX;

y = rand() / RAND_MAX;if ( (x*x + y*y) <= 1 )

NCircle = NCircle + 1;}Area = 4 * (NCircle/N);

A random number ?

x and y are independent

We call (x,y) a sample

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 31

Multiple Dependent Random Variables• Markov Chain: n random variables

• Bayesian Networks: 5 random variables

X1 Xk Xn......

Burglary Earthquake

Alarm

John Calls Mary Calls

Variables are dependent

What is a sample ?

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 32

Sampling• It is to randomly generate a sample

– For a random variable X orA set of random variables X1, …, Xn• Boolean, Discrete, Continuous• Multivariate

– Independent, dependent– According to a probability distribution P(X)

• Discrete X: Histogram• Continuous X:

– Uniform, Gaussian, or – Any distribution: Gaussian mixture models

UnivariateMultivariate

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 33

Sub-Sections for Generating a Sample

• 2.1 Univariate– Uniform, Gaussian, Gaussian mixture

• 2.2 Multivariate– Uniform– Gaussian

• Independent, dependent– Any distribution

• Gaussian mixture– Independent, dependent

• Bayesian network

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 34

2.1 Univariate• For a random variable X

– Boolean, discrete, continuous, hybrid• We know P(X) is

– Uniform, Gaussian, Gaussian mixture• Generate a sample X according to P(X)

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 35

Uniform Generator• Every programming language provides

a rand()/random() function to generate a uniform-distributed number– Integer number within [0, MAX)

• Sampling a Boolean uniform number– rand() %2

• Sampling a discrete uniform number within [0, d)– rand() % d

• Sampling a continuous uniform number– Within [0, 1): rand() % MAX– Within [a, b): a + (rand() % MAX)*(a-b)

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 36

Example : Uniform Generator• x=rand(1,10000);• h=hist(x,20);• bar(h);

0 5 10 15 20 250

100

200

300

400

500

600

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 37

Gaussian Generator (1/2)• Sampling Gaussian can be obtained by

uniform distribution• There are functions in C/Java/Matlab to

randomly generate a univariate Gaussian real number with (, )=(0,1)– C : Numerical recipies in C, – Java: Random.nextGaussian()– Matlab: randn()

• Suppose it is called Gaussian()

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 38

Gaussian Generator (2/2)• Sampling a continuous Gaussian

number with (, )– (Gaussian() * ) +

• Sampling a discrete Gaussian number with (, ) ?

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 39

Example : Gaussian Generator (1/2)• Pseudo codes

– Assume Gaussian() is a pseudo function to generate Gaussian numbers

– double x[10000]; for (i=0; i<10000; i++)

x[i] = Gaussian();– for (i=0; i<10000; i++)

x[i] = + Gaussian() * ;

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 40

Example : Gaussian Generator (2/2)• Matlab

– x=randn(1,10000);– h=hist(x,20);– bar(h);

• Java– Random r=new

Random();int x[10000];for (i=0;i<10000;i++)x[i]=r.nextGaussian();

0 5 10 15 20 250

200

400

600

800

1000

1200

1400

1600

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 41

Gaussian Mixture Generator (1/2)• Random variable X with Gaussian

– P(X) = N(X; , )• Random variable Y with Gaussian

mixture – P(Y) = m mN(Y; m, m)

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 42

Gaussian Mixture Generator (2/2)• Generate N samples of X

– for (i=0; i<N; i++)x[i]=(Gaussian() * ) +

• Generate N samples of Y with mixture of M Gaussians– Each Gaussian m has m, m– for (m=0; m<M; m++)

for (i=0; i<N*m; i++)y[m][i] = (Gaussian() * m) + m

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 43

Example : Gaussian Mixture Generator

• N=10000; pi1=0.8; pi2=0.2;• mu1=0; mu2=15; sigma1=3; sigma2=5;• x1 = mu1 + randn(1,N*pi1) * sigma1;• x2 = mu2 + randn(1,N*pi2) * sigma2;• x = [x1, x2];• h=hist(x,50);• bar(h);

0 10 20 30 40 50 600

100

200

300

400

500

600

700

800

900

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 44

2.2 Multivariate• For random variables X1,… ,Xn

– Boolean, discrete, continuous, hybrid• We know P(X1,… ,Xn) is

– Uniform, Gaussian, Gaussian mixture, any distribution

• Generate a sample (X1,… ,Xn) according to P(X1,… ,Xn)– Independent– Dependent

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 45

Multivariate Boolean Uniform Generator

• Boolean random variables X1,… ,Xn• int X[n]; // A sample

for (i=0; i<n; i++)X[i] = rand() % 2;

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 46

Multivariate Discrete Uniform Generator

• Discrete random variables X1,…, Xn– Each with d discrete values: [0, d-1]– Each Xi is uniform distributed– X1,…, Xn must be independent

• int X[n]; // A samplefor (i=0; i<n; i++)

X[i] = rand() % d;

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 47

Multivariate Gaussian Generator - Independent (1/2)

• Pseudo codes• For n random variables X=(X1,…,Xn)

– Gaussian : N(X; , ) • Mean vector: • Covariance matrix: =[ij]

• X1,…,Xn are independent– ij = 0 for ij

• Generate a sample of X Generate each Xi independently

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 48

Multivariate Gaussian Generator - Independent (2/2)

• Generate a sample of X =(X1,…,Xn) with i=0, ii=1, ij = 0 for ij– int X[n]; // a sample

for (i=0; i<n; i++)X[i] = Gaussian();

• Generate a sample of X =(X1,…,Xn) with i0, ii 1, ij = 0 for ij– int X[n]; // a sample

for (i=0; i<n; i++)X[i] = i + Gaussian() * ii;

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 49

Example – Matlab (1/2)mx=[0 0]';Cx=[1 0; 0 1];x1=-3:0.1:3;x2=-3:0.1:3;for i=1:length(x1),

for j=1:length(x2),

f(i,j)=(1/(2*pi*det(Cx)^1/2))*exp((-1/2)*([x1(i) x2(j)]-mx')*inv(Cx)*([x1(i);x2(j)]-mx));

endendmesh(x1,x2,f)pause;contour(x1,x2,f)pause

1001

)0,0(

X

TX

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 50

Example – Matlab (2/2)• Randomly generate 1000 samples for

y1=randn(1,1000);y2=randn(1,1000);plot(y1,y2,'.');

1001

,)0,0( XT

X

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 51

Multivariate Gaussian Generator - Dependent (1/4)

• For n random variables X=(X1,…,Xn)–Gaussian : N(X; , )

• Mean vector: • Covariance matrix: =[ij]

– is a positive definite matrix• Symmetric and all eigenvalues (pivots) > 0

– For general matrix A : A= LDU• L: lower triangular, U: upper triangular

D: diagonal matrix of pivots– For symmetric matrix S: S = LDLT

– For positive definite matrix = LDLT =– This is called Cholesky decomposition

• X1,…,Xn are dependent–ij 0

T TL D L D PP

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 52

Multivariate Gaussian Generator - Dependent (2/4)

• Generate a sample of X with , – Perform Cholesky decomposition of

• Cholesky decomposition is pivot decompositionfor positive definite matrix

• = PP-1 = PPT

– Generate independent Gaussian Y=(Y1,…,Yn )with i=0, i=1

– X = PY +

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 53

Multivariate Gaussian Generator - Dependent (3/4)

• Pseudo code to generate a sample of Xwith , – Matrix ;

Vector ;Vector X(n), Y(n); // a sample

Matrix P=chol(); //Cholesky decomp. for (i=0; i<n; i++) Y(i) = Gaussian();X = P * Y +

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 54

Multivariate Gaussian Generator - Dependent (4/4)

• Proof– For n random variables X=(X1,…,Xn) with , – Generate n independent, zero-mean, unit variance

normal random variables Y=(Y1,…,Yn)

,)0,,0(,),,( 1T

YT

nYYY

10

01

Y

– Take X = PY+, where =PP-1 =PPT

TTTTTT

T

PPPYYPEPPYYEPYPYEXXEX

}{}{}))({())(( ofMatrix Covariance

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 55

Example – Matlab (1/4)

mx=[0 0]';Cx=[1 1/2; 1/2 1];P=chol(Cx);

2/12/1

01,

12/12/11

)0,0(

23

PX

TX

Assume

Matlab:

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 56

Example – Matlab (2/4)• Randomly generate 1000 samples for

• mx=zeros(2,1000);y1=randn(1,1000);y2=randn(1,1000);y=[y1;y2];P=[1, 0; 1/2, sqrt(3)/2];x=P*y+mx;x1=x(1,:);x2=x(2,:);plot(x1,x2,'.');r=corrcoef(x1',x2');

12/12/11

,)0,0( XT

X

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 57

Example – Matlab (3/4)

• mx=[5 5]';• Cx=[1 9/10; 9/10 1];• P=chol(Cx);

9.0

01,

19.09.01

)5,5(

1019

109

PX

TX

Assume

Matlab:

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 58

Example – Matlab (4/4)• Randomly generate 1000 samples for

• mx=5*ones(2,1000);y1=randn(1,1000);y2=randn(1,1000);y=[y1;y2];P=[1, 0; 9/10, sqrt(19)/10];x=P*y+mx;x1=x(1,:);x2=x(2,:);plot(x1,x2,'.');r=corrcoef(x1',x2');

19.09.01

,)5,5( XT

X

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 59

Multivariate Gaussian Mixture Generator

• Generate N samples of X with mixture of MGaussians (Matlab-like pseudo code)– for (m=0; m<M; m++)

{ Matrix P=chol(m) //Cholesky decomposition for (i=0; i<N*m; i++){ //Generate n independent normally distributed

// R.V. (=0, =1) y = randn(1, n)// Transform y into x x = P * y +

}}

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 60

Example – Matlab (1/4)• Combine the previous two Gaussians:1=0.5, 2=0.5,

-4 -2 0 2 4 6 8 10-3

-2

-1

0

1

2

3

4

5

6

7

12/12/11

)0,0(

1

1T

19.09.01

)5,5(

2

2T

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 61

Example – Matlab (2/4)• pi1= 0.5; pi2=0.5; N=2000;

mx1=zeros(2,pi1*N); Cx1=[1 1/2; 1/2 1];P1=chol(Cx1); %P=[1, 0; 1/2, sqrt(3)/2];y1_1=randn(1,pi1*N); y1_2=randn(1,pi1*N);y1=[y1_1;y1_2];

x1=P1*y1+mx1; x1_1=x1(1,:); x1_2=x1(2,:);

mx2=5*ones(2,pi2*N); Cx2=[1 9/10; 9/10 1];P2=chol(Cx2); %P=[1, 0; 1/2, sqrt(3)/2];y2_1=randn(1,pi2*N); y2_2=randn(1,pi2*N);y2=[y2_1;y2_2];x2=P2*y2+mx2; x2_1=x2(1,:); x2_2=x2(2,:);

z1=[x1_1,x2_1]; z2=[x1_2,x2_2];plot(z1,z2,'.');

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 62

Example – Matlab (3/4)• Combine the previous two Gaussians1=0.2, 2=0.8

-4 -2 0 2 4 6 8 10-3

-2

-1

0

1

2

3

4

5

6

7

12/12/11

)0,0(

1

1T

19.09.01

)5,5(

2

2T

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 63

Example – Matlab (4/4)• pi1= 0.2; pi2=0.8; N=2000;

mx1=zeros(2,pi1*N); Cx1=[1 1/2; 1/2 1];P1=chol(Cx1); %P=[1, 0; 1/2, sqrt(3)/2];y1_1=randn(1,pi1*N); y1_2=randn(1,pi1*N);y1=[y1_1;y1_2];

x1=P1*y1+mx1; x1_1=x1(1,:); x1_2=x1(2,:);

mx2=5*ones(2,pi2*N); Cx2=[1 9/10; 9/10 1];P2=chol(Cx2); %P=[1, 0; 1/2, sqrt(3)/2];y2_1=randn(1,pi2*N); y2_2=randn(1,pi2*N);y2=[y2_1;y2_2];x2=P2*y2+mx2; x2_1=x2(1,:); x2_2=x2(2,:);

z1=[x1_1,x2_1]; z2=[x1_2,x2_2];plot(z1,z2,'.');

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 64

Exercise• Write a program to randomly generate

1000 samples of 3-dimensional Gaussian with =(5,10,-3), =(2,1,3;4,2,2;3,1,2)

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 65

Any Distribution• For random variables X1,… ,Xn

– Boolean, discrete, continuous, hybrid• We know P(X1,… ,Xn) has no closed-form

formula– Independent: P(X1,… ,Xn)= P(X1)… P(Xn) – Dependent:

P(X1,… ,Xn)= P(Xi | Parent(Xi))• Generate a sample (X1,… ,Xn) according to

P(X1,… ,Xn)– Independent: generate each Xi by P(Xi) – Dependent: generate each Xi by P(Xi| Parent(Xi))

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 66

Two Boolean R.V.s - Independent• X1, X2 have distributions :

– P(X1)=<0.67, 0.33>, P(X2)=<0.75,0.25>• int X1, X2;

for (i=0; i<1000; i++){ if (rand() > RAND_MAX/3)

X1 = 1;else X1 = 0;if (rand() > RAND_MAX/4)

X2 = 1;else X2 = 0;

}

X10 1

P(X1)0.67

X20 1

P(X2)0.75

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 67

Two Boolean R.V.s - Dependent• X1, X2 have distributions :

– P(X1)=<0.67, 0.33>– P(X2|X1=T)=<0.75,0.25>, P(X2|X1=F)=<0.8,0.2>

• Generate a sample (x1, x2)if (rand() > RAND_MAX/3) x1 = 1;else x1 = 0;if (x1==1)

if (rand() > RAND_MAX/4) x2 = 1;else x2 = 0;

else // x1==0if (rand() > RAND_MAX/5) x2 = 1;else x2 = 0;

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 68

Markov Chain• Markov Chain: n random variables

X1 Xk Xn......

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 69

Bayesian Network• Example: 5 random variables

Burglary Earthquake

Alarm

John Calls Mary Calls

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 70

3. Stochastic Simulation

• Also called–Monte Carlo Methods–Sampling Methods

• Sub-sections–3.1 Direct sampling–3.2 Rejection sampling–3.3 Likelihood weighting

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 71

3.1 Direct Sampling• Generate N samples randomly• For the inference P(X|E)

–P(X|E)= P(X^E) / P(E)–Get N(E) & N(X^E) from the N

samples• N(E) : No. of samples of E• N(X^E) : No. of samples of X and E

–P(E) = N(E) / N, P(X^E) = N(X^E) / N

–P(X|E) = N(X^E) / N(E)

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 72

Example (1/4)• For the sprinkler network

–Estimate P(w|r)by direct sampling

–4 random variables–A sample =

(c,s,r,w)

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 73

Example (2/4)• Generate 1000 samples

Cloudy Sprinkler Rain WetGrass

T T T FF T T FF F T TT T T FT T T F... ... ... ...F T T F

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 74

Example (3/4)• P(r| w) = P(r, w)/P(w)

Cloudy Sprinkler Rain WetGrass

T T T FF T T FF F T TT T F F... ... ... ...F T T F

Nw: No. of WetGrass=FalseNr^w: No. of (Rain=True&WetGrass=False)

Nr^w / Nw

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 75

Example (4/4)• P(R|w)

– = P(R, w)/P(w)– = < P(r ^ w)/P(w), P(r ^ w)/P(w) >

Cloudy Sprinkler Rain WetGrass

T T T FF T T FF F T TT T F F... ... ... ...F T T F

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 76

How to Generate a Sample for the Bayesian Network? (1/3)

• The sprinkler Bayesian network

•Assume a sampling order:[ Cloudy, Sprinkler,

Rain, WetGrass ]

A sample is an atomic event :(cloundy,sprinkler,rain,wetgrass)=(T, F, T, T)

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 77

How to Generate a Sample for the Bayesian Network? (2/3)

• int C, S, R, W;for (i=0; i<1000; i++){ if (rand() > RAND_MAX/2) C = T;

else C = F;if (rand() > RAND_MAX/2) S = T;

else S = F; if (rand() > RAND_MAX/2) R = T;

else R = F; if (rand() > RAND_MAX/2) W = T;

else W = F; } Incorrect

Implementation

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 78

How to Generate a Sample for the Bayesian Network? (3/3)

• int C, S, R, W;for (i=0; i<1000; i++){ if (rand() > RAND_MAX/2) C = T;

else C = F;if (C==T)

if (rand() > RAND_MAX*0.9) S = T;

else S = F; else // C==F

if (rand() > RAND_MAX/2) S = T;

else S = F;...

}

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 79

An Example Generating One Sample (1/8)

• The sampling algorithm1.Sample from P(Cloudy)=<0.5, 0.5>

– Suppose it returns true2.Sample from

P(Sprinkler|Cloudy=true)=<0.1,0.9>– Suppose it returns false

3.Sample from P(Rain|Cloudy=true)=<0.8,0.2>– Suppose it returns true

4.Sample from P(WetGrass|Sprinkler=false, Rain=true) = <0.9,0.1>– Suppose it returns true

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 80

An Example Generating One Sample (2/8)

Samples:

C S R W

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 81

An Example Generating One Sample (3/8)

Random sampling: Cloudy

Return: Cloudy=trueSamples:

C S R Wc

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 82

An Example Generating One Sample (4/8)

Random sampling1. Sprinkler2. RainGiven Cloudy=true

Samples:

C S R Wc

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 83

An Example Generating One Sample (5/8)

Random samplingSprinklerGiven Cloudy=true

Return: Sprinkler=false

Samples:

C S R Wc s

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 84

An Example Generating One Sample (6/8)

Random sampling RainGiven Cloudy=true

Return: Rain=true

Samples:

C S R Wc s r

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 85

An Example Generating One Sample (7/8)

Random sampling WetGrassGiven Rain=true,

Sprinkler=false

Samples:

C S R Wc s r

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 86

An Example Generating One Sample (8/8)

Random sampling WetGrassGiven Rain=true,

Sprinkler=false

Return: WetGrass=true

Samples:

C S R Wc s r w

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 87

The Algorithm (1/2)• To generate one sample

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 88

The Algorithm (2/2)• In previous example

–We get a sample [true, false, true, true] of a Bayesian network using the Prior-Sample

• The sampling of a Bayesian network–Repeat the sampling N times–We get N samples

• We can use the N samples to compute any query probability in the Bayesian network

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 89

How It Works (1/2)• Why any probability can be

answered from the sampling?–The N samples is actually a full joint

distribution table (FJD)C S R WT T T FF T T FF F T TT T F F... ... ... ...F T T F

C S R W PT T T F 0.02F T T F 0.13F F T T 0.04T T F F 0.15... ... ... ... ...

FJD

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 90

Why It Works (2/2)• A sample is an atomic event (x1, ..., xn)• P(x1, ..., xn) N(x1, ..., xn) / N• Therefore, a FJD is generated from

the N samples• Note: N < 2n

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 91

Exercise: Direct Sampling

smart study

prepared fair

pass

p(smart)=.8 p(study)=.6

p(fair)=.9

p(prep|…) smart smartstudy .9 .7study .5 .1

p(pass|…)smart smart

prep prep prep prepfair .9 .7 .7 .2fair .1 .1 .1 .1

Query: What is the probability that a student studied, given that they pass the exam?

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 92

Problems of Direct Sampling• It needs to generate very many

samples in order to obtain the approximate FJD

• For a query of conditional probability P(X|e)–Can we just approximate the

conditional probability?–Yes, the following two algorithms will

do this

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 93

3.2 Rejection Sampling• is estimated from samples

agreeing with e)|(ˆ eXP

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 94

An Example• Estimate P(Rain|Sprinkler=true)

using 100 samples–27 samples have Sprinkler = true–Of these, 8 have Rain=true and

19 have Rain=false–P(Rain|Sprinkler=true) =

Normalize(<8,19>) = <0.296, 0.704>• Similar to a basic real-world

empirical estimation procedure

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 95

Analysis of Rejection Sampling

• Hence rejection sampling returns consistent posterior estimates

• Problem: expensive if P(e) is small–P(e) drops off exponentially with

number of evidence variables!

)|()|(ˆ)(

),()(

),( eXPeXP ePeXP

eNeXN

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 96

3.3 Likelihood Weighting• Avoids the inefficiency of rejection

sampling –By generating only events consistent

with the evidence variables e• Idea

–Fix evidence variables,–Sample only hidden variables–Weight each sample event by the

likelihood it accords the evidence• Events have different weights

Randomly generatea sample event

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 97

An Example (1/9)• Query P(Rain|sprinkler, wetgrass)

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 98

An Example (2/9)1. Set the weight =1.02. Sample from P(Cloudy)=<0.5,0.5>

• Suppose it returns true3. The evidence Sprinkler=true. So we set

= P(sprinkler|cloudy)=1*0.1=0.14. Sample from P(Rain|cloudy)=<0.8,0.2>

• Suppose it returns true5. The evidence WetGrass=true. So we set

= P(wetgrass|sprinkler,rain) =0.1*0.99=0.099

A sample event (true, true, true, true) with weight 0.099

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 99

An Example (3/9)

=1.0

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 100

An Example (4/9)

=1.0

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 101

An Example (5/9)

=1.0

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 102

An Example (6/9)

=1.0 0.1

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 103

An Example (7/9)

=1.0 0.1

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 104

An Example (8/9)

=1.0 0.1

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 105

An Example (9/9)

=1.0 0.1 0.99= 0.099

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 106

The Algorithm (1/2)• The example generates a sample

event (true, true, true, true) for the query P(Rain|sprinkler, wetgrass)

• Repeat the sampling N times–We get N sample events–Each event has a likelihood weight –1 = rain=true , 1 = rain=false

• P(Rain|sprinkler, wetgrass) = < 1/(1+2), 2/(1+2) >

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 107

The Algorithm (2/2)

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 108

Exercise: Likelihood Weighting

smart study

prepared fair

pass

p(smart)=.8 p(study)=.6

p(fair)=.9

p(prep|…) smart smartstudy .9 .7study .5 .1

p(pass|…)smart smart

prep prep prep prepfair .9 .7 .7 .2fair .1 .1 .1 .1

Query: What is the probability that a student studied, given that they pass the exam?

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 109

Analysis (1/3)• Why the algorithm works? P(X|E=e)• Let the sampling probability for

WEIGHTED-SAMPLE be SWS–The evidence variables E are fixed

with e–All the other variables Z = {X} Y–The algorithm samples each variable

in Z given its parent values

l

iiiWS ZparentszPezS

1

))(|(),(

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 110

Analysis (2/3)• The likelihood weight w for a given

sample (z, e)=(x, y, e) is

• The weighted probability of a sample (z,e)=(x, y, e) is

m

iii EparentsePezw

1

))(|(),(

),,(

))(|())(|(

),(),(

11eyxP

EparentsePZparentszP

ezwezSm

iii

l

iii

WS

n

iiin XparentsxPxxP

11 ))(|(),,(

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 111

Analysis (3/3)

y

WS eyxweyxNexP ),,(),,()|(ˆ

y

WS eyxweyxS ),,(),,('

)|(),(' exPexP

y

eyxP ),,('

So the algorithm works

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 112

Discussions• Likelihood weighting is efficient

because it uses all the samples generated

• However, it suffers a degradation in performance as the no. of evidence variables increases, because –Most samples will have very low weights,–The weighted estimate will be dominated

by the tiny fraction of samples that have infinitesimal likelihood

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 113

4. Inference by MCMC• Key idea

– Sampling process as a Markov Chain• Next sample depends on the previous one

– Approximate any posterior distribution• "State" of network

= current assignment to all variables• Generate next state

– by sampling one variable given Markov blanket

• Sample each variable in turn, keeping evidence fixed

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 114

The Markov Chain• With Sprinkler =true, WetGrass=true,

there are four states:

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 115

Markov Blanket Sampling• Markov blanket of Cloudy is

–Sprinkler and Rain• Markov blanket of Rain is

–Cloudy, Sprinkler, and WetGrass• Probability given the Markov

blanket is calculated as follows–P(x'i|MB(Xi))

= P(x'i|Parents(Xi))ZjChildren(Xi)P(zj|Parents(Zj))

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 116

An Example (1/2)• Estimate P(Rain|sprinkler,wetgrass)• Loop for N times

–Sample Cloudy or Rain given its Markov blanket

• Count number of times Rain=trueand Rain=false in the samples

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 117

An Example (2/2)• E.g., visit 100 states

–31 have Rain=true, –69 have Rain=false

• P(Rain|sprinkler,wetgrass)= Normalize(<31, 69>) = <0.31, 0.69>

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 118

The Algorithm

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 119

Why it works• Skipped

–Details in pp. 517-518 in the AIMA 2e textbook

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 120

Sub-Sections• 4.1 Markov chain theory• 4.2 Two MCMC sampling algorithms

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 121

4.1 Markov Chain Theory• Suppose X1, X2, … take some set of values

– wlog. These values are 1, 2, ...• A Markov chain is a process that corresponds

to the network:

• To quantify the chain, we need to specify– Initial probability: P(X1)– Transition probability: P(Xt+1|Xt)

• A Markov chain has stationary transition probability: P(Xt+1|Xt) same for all times t

X1 X2 X3 Xn... ...

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 122

Irreducible Chains

• A state j is accessible from state i if there is an n such that P(Xn = j | X1 = i) > 0– There is a positive probability of reaching j from i after some number steps

• A chain is irreducible if every state is accessible from every state

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 123

Ergodic Chains• A state is positively recurrent if there is a

finite expected time to get back to state iafter being in state i– If X has finite number of states, then this is

suffices that i is accessible from itself

• A chain is ergodic if it is irreducible and every state is positively recurrent

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 124

(A)periodic Chains• A state i is periodic if there is an integer d such that when n is not divisible by d

P(Xn = i | X1 = i ) = 0• Intuition: only every d steps state i may

occur • A chain is aperiodic if it contains no

periodic state

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 125

Stationary ProbabilitiesThm:• If a chain is ergodic and aperiodic, then

the limitexists, and does not depend on i

• Moreover, letthen, P*(X) is the unique probability satisfying

)|(lim 1 iXXP nn

)|(lim)( 1* iXjXPjXP nn

i

tt iXPiXjXPjXP )()|()( *1

*

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 126

Stationary Probabilities• The probability P*(X) is the stationary

probability of the process• Regardless of the starting point, the

process will converge to this probability

• The rate of convergence depends on properties of the transition probability

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 127

Sampling from the Stationary Probability

• This theory suggests how to sample from the stationary probability:– Set X1 = i, for some random/arbitrary i– For t = 1, 2, …, n

•Sample a value xt+1 for Xt+1 from P(Xt+1|Xt=xt)

– return xn• If n is large enough, then this is a sample

from P*(X)

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 128

Designing Markov Chains• How do we construct the right chain to

sample from?– Ensuring aperiodicity and irreducibility is

usually easy

• Problem is ensuring the desired stationary probability

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 129

Designing Markov ChainsKey tool:• If the transition probability satisfies

then, P*(X) = Q(X)• This gives a local criteria for checking

that the chain will have the right stationary distribution

0)|1(whenever)()(

)|()|(

1

1

itXjtXPiXQjXQ

jXiXPiXjXP

tt

tt

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 130

MCMC Methods• We can use these results to sample from P(X1,…,Xn|e)

Idea:• Construct an ergodic & aperiodic

Markov Chain such that P*(X1,…,Xn) = P(X1,…,Xn|e)

• Simulate the chain n steps to get a sample

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 131

MCMC MethodsNotes:• The Markov chain variable Y takes as

value assignments to all variables that are consistent evidence

• For simplicity, we will denote such a state using the vector of variables

}satisfy,...,|)()(,...,{)( 1111 enn xxXVXVxxYV

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 132

4.2 Two MCMC Sampling Algorithms

• Gibbs Sampler• Metropolis-Hastings Sampler

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 133

Gibbs Sampler• One of the simplest MCMC method• Each transition changes the state of one Xi

• The transition probability defined by P itself as a stochastic procedure:– Input: a state x1,…,xn– Choose i at random (uniform probability)– Sample x’i from P(Xi|x1, …, xi-1, xi+1 ,…, xn, e)

– let x’j = xj for all j i– return x’1,…,x’n

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 134

Correctness of Gibbs Sampler• How do we show correctness?

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 135

Correctness of Gibbs Sampler• By chain ruleP(x1,…,xi-1, xi, xi+1,…,xn|e) =P(x1,…,xi-1, xi+1,…,xn|e)P(xi|x1,…,xi-1, xi+1,…,xn, e)

• Thus, we get

• Since we choose i from the same distribution at each stage, this procedure satisfies the ratio criteria

),,,,,,|'(),,,,,,|(

)|,,,',,,()|,,,,,,(

111

111

111

111ee

ee

niii

niii

niii

niiixxxxxPxxxxxP

xxxxxPxxxxxP

Transition

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 136

Gibbs Sampling for Bayesian Network

• Why is the Gibbs sampler “easy” in BNs?• Recall that the Markov blanket of a

variable separates it from the other variables in the network– P(Xi | X1,…,Xi-1,Xi+1,…,Xn) = P(Xi | Mbi )

• This property allows us to use localcomputations to perform sampling in each transition

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 137

Gibbs Sampling in Bayesian Networks

• How do we evaluate P(Xi | x1,…,xi-1,xi+1,…,xn) ?

• Let Y1, …, Yk be the children of Xi– By definition of Mbi, the parents of Yj are

in Mbi{Xi}• It is easy to show that

i

j

j

x jyjii

jyjii

ii payPPaxP

payPPaxPMbxP

')|()|'(

)|()|()|(

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 138

Metropolis-Hastings• More general than Gibbs (Gibbs is a

special case of M-H)• Proposal distribution arbitrary q(x’|x)

that is ergodic and aperiodic (e.g., uniform)

• Transition to x’ happens with probability(x’|x)=min(1, P(x’)q(x|x’)/P(x)q(x’|x))

• Useful when computing P(x) infeasible• q(x’|x)=0 implies P(x’)=0 or q(x|x’)=0

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 139

Sampling Strategy• How do we collect the samples?Strategy I:• Run the chain M times, each for N steps

– each run starts from a different state points

• Return the last state in each run

M chains

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 140

Sampling StrategyStrategy II:• Run one chain for a long time• After some “burn in” period, sample

points every some fixed number of steps

“burn in” M samples from one chain

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 141

Comparing StrategiesStrategy I:

– Better chance of “covering” the space of pointsespecially if the chain is slow to reach stationarity

– Have to perform “burn in” steps for each chainStrategy II:

– Perform “burn in” only once– Samples might be correlated (although only weakly)

Hybrid strategy: – Run several chains, sample few times each– Combines benefits of both strategies

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 142

Short Summary -Approximate Inference

• Monte Carlo (sampling with positive and negative error) Methods:– Pos: Simplicity of implementation and

theoretical guarantee of convergence– Neg: Can be slow to converge and hard to

diagnose their convergence.• Variational Methods – Your presentation• Loopy Belief Propagation and Generalized

Belief Propagation -- Your presentation

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 143

Exercise: MCMC Sampling

smart study

prepared fair

pass

p(smart)=.8 p(study)=.6

p(fair)=.9

p(prep|…) smart smartstudy .9 .7study .5 .1

p(pass|…)smart smart

prep prep prep prepfair .9 .7 .7 .2fair .1 .1 .1 .1

Query: What is the probability that a student studied, given that they pass the exam?

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 144

Main Computational Problems1. Difficult to tell if convergence has

been achieved2. Can be wasteful if Markov

blanket is large– P(Xi|MB(Xi)) won't change much

(law of large numbers)

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 145

5. Loopy Belief Propagation• TBU

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 146

6. Variational Methods• TBU

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 147

7. Implementation by PNL

PNL GeNIeEnumeration v (Naïve)Variable EliminationBelief Propagation v (Pearl) v (Polytree)Junction Tree v v (Clustering)Direct Sampling v (Logic)Likelihood Sampling v(LWSampling) v(Likelihood

sampling)MCMC Sampling v(Gibbswithanneal) (Other 5 samplings)

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 148

8. Summary• Exact inference by variable

elimination–Polytime on polytrees–NP-hard on general graphs–Space = time, very sensitive to

topology

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 149

Summary• Approximate inference by LW,

MCMC–LW does poorly when there is lots of

(downstream) evidence–LW, MCMC generally insensitive to

topology–Convergence can be very slow with

probabilities close to 1 or 0–Can handle arbitrary combinations of

discrete and continuous variables

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 150

Summary• What we know

–What is a Bayesian network–How to inference, given a Bayesian

network• However, we still need to know

–How to learn CPTs–How to build or automatically learn

the structure of a Bayesian network by given a set of data

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 151

9. References• General Introduction to Probabilistic Inference in

BN– B. D’Ambrosio, Inference in Bayesian networks, AI

Magazine, 1999.– M. I. Jordan & Y. Weiss, Probabilistic Inference in

graphical models,.– Andrieu, C., De Freitas, J., Doucet, A., & Jordan, M. I.

(in press). An introduction to MCMC for machine learning. Machine Learning, vol. 50, pp.5-43, 2003..

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 152

Recent Books• R. E. Neapolitan, Learning Bayesian Networks,

Prentice Hall, 2004.• C. Borgelt and R. Kruse, Graphical Models:

methods for data analysis and mining, Wiley, 2002.• D. Edwards, Introduction to Graphical Modelling,

2nd, Springer, 2000.• S. L. Lauritzen, Graphical Models, Oxford, 1996.• M. I. Jordan (ed.), Learning in Graphical Models,

MIT, 2001.

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 153

Appendix• Theoretical analysis of approximation

error

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 154

Types of ApproximationsAbsolute error• An estimate q of P(X=x|e) has

absolute error , ifP(X=x|e) - q P(X=x|e) +

equivalentlyq - P(X = x|e) q +

• Not always what we want: error 0.001– Unacceptable if P(X = x | e) = 0.0001,– Overly precise if P(X = x | e) = 0.3

0

1

q2

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 155

Types of ApproximationsRelative error• An estimate q of P(X=x|e)

has relative error , ifP(X=x|e)(1-) q P(X=x|e)(1+)equivalently

q/(1+) P(X=x|e) q/(1-)• Sensitivity of approximation

depends on actual value of desired result 0

1

q

q/(1+)

q/(1-)

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 156

Complexity• Recall, exact inference is NP-hard• Is approximate inference any easier?

• Construction for exact inference:– Input: a 3-SAT problem – Output: a BN such that P(X=t) > 0 iff is

satisfiable

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 157

Complexity: Relative Error• Suppose that q is a relative error

estimate of P(X = t), • If is not satisfiable, then

P(X = t)(1 - ) q P(X = t)(1 + )0 = P(X = t)(1 - ) q P(X = t)(1 + ) = 0Thus, if q > 0, then is satisfiable

An immediate consequence:

Thm: Given , finding an -relative error approximation is NP-hard

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 158

Complexity: Absolute error• Thm: If < 0.5, then finding an

estimate of P(X=x|e) with absulote error approximation is NP-Hard

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 159

Likelihood Weighting• Can we ensure that all of our sample

satisfy e?• One simple solution:

–When we need to sample a variable that is assigned value by e, use the specified value

• For example: we know Y = 1–Sample X from P(X)–Then take Y = 1

• Is this a sample from P(X,Y |Y = 1) ?

X Y

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 160

Likelihood Weighting• Problem: these samples of X from P(X)• Solution:

– Penalize samples in which P(Y=1|X) is small

• We now sample as follows:– Let x[i] be a sample from P(X)– Let w[i] be P(Y = 1|X = x [i])

X Y

i

i

iw

[i])x|XPiw)xXP

][

(][1|(

xY

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 161

Likelihood Weighting• Why does this make sense?• When N is large, we expect to sample NP(X = x) samples with x[i] = x

• Thus,

• When we normalize, we get approximation of the conditional probability

)1,(

)|1()(][,

YxXNP

xXYPxXNPwxixi

i

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 162

Samples:

B E A C R

Likelihood WeightingP(b) 0.03

P(e) 0.001

P(a)b e b e b e b e0.98 0.40.7 0.01

P(c)a

0.8 0.05

P(r)e e

0.3 0.001

b

Earthquake

Radio

Burglary

Alarm

Call

0.03

Weight

= r

a

= a

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 163

Samples:

B E A C R

Likelihood WeightingP(b) 0.03

P(e) 0.001

P(a)

b e b e b e b e0.98 0.40.7 0.01

P(c)

a a0.8 0.05

P(r)

e e0.3 0.001

eb

Earthquake

Radio

Burglary

Alarm

Call

0.001

Weight

= r = a

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 164

Samples:

B E A C R

Likelihood WeightingP(b) 0.03

P(e) 0.001

P(a)

b e b e b e b e0.98 0.40.7 0.01

P(c)

a a0.8 0.05

P(r)e e

0.3 0.001

eb

0.4

Earthquake

Radio

Burglary

Alarm

Call

Weight

= r = a

0.6a

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 165

Samples:

B E A C R

Likelihood WeightingP(b) 0.03

P(e) 0.001

P(a)

b e b e b e b e0.98 0.40.7 0.01

P(c)

a a0.8 0.05

P(r)

e e0.3 0.001

e cb

Earthquake

Radio

Burglary

Alarm

Call

0.05Weight

= r = a

a 0.6

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 166

Samples:

B E A C R

Likelihood WeightingP(b) 0.03

P(e) 0.001

P(a)b e b e b e b e0.98 0.40.7 0.01

P(c)a a

0.8 0.05

P(r)e e

0.3 0.001

e cb r

0.3

Earthquake

Radio

Burglary

Alarm

Call

Weight

= r = a

a 0.6 *0.3

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 167

Likelihood Weighting• Let X1, …, Xn be order of variables

consistent with arc direction• w = 1• for i = 1, …, n do

–if Xi = xi has been observed•w w* P(Xi = xi | pai )

–else•sample xi from P(Xi | pai )

• return x1, …,xn, and w

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 168

Importance Sampling• A method for evaluating expectation of f

under P(x), <f>P(X)• Discrete:• Continuous:

• If we could sample from P

dxxPxff

xPxff

XP

xXP

)()(

)()(

)(

)(

r

XP rxfR

f ])[(1)(

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 169

Importance SamplingA general method for evaluating <f>P(X) when we cannot sample from P(X).Idea: Choose an approximating distribution

Q(X) and sample from it

Using this we can now sample from Q and then

x XQx

XP XQXPxfdx

XQXQxPxfdxxPxfxf

)()( )(

)()()()()()()()()(

W(X)

M

m

M

mXP mwmxf

MmXf

Mxf

1 1)(

)(])[(1])[(1)(

If we could generate samples from P(X)

Now that we generate the samples from Q(X)

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 170

(Unnormalized) Importance Sampling

1. For m=1:MSample X[m] from Q(X)Calculate W(m) = P(X)/Q(X)

2. Estimate the expectation of f(X) using

Requirements: P(X)>0 Q(X)>0 (don’t ignore possible scenarios) Possible to calculate P(X),Q(X) for a specific X=x It is possible to sample from Q(X)

M

mXP mwmxf

Mxf

1)(

)(])[(1)(

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 171

Normalized Importance SamplingAssume that we cannot evaluate P(X=x) but can evaluate P’(X=x) = P(X=x)(ex., we can evaluate P(X) but not P(X|e) in a Bayesian network)We define w’(X) = P’(X)/Q(X). We can then evaluate :

and then:

In the last step we simply replace with the above equation

xx

XQαxP

XQXPXQXw )('

)()(')()('

)(

)(

)()(

)(

)(')(')(

)(')(1)()()(')(1

)()()()()()()(

XQ

XQXQ

x

xxXP

XwXwXf

XwXfα

dxXQXQxPxf

α

dxXQXQxPxfdxxPxfxf

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 172

Normalized Importance SamplingWe can now estimate the expectation of f(X) similarly to unnormalized importance sampling by sampling x[m] from Q(X) and then

(hence the name “normalized”)

M

m

M

mXP

mw

mwmxfxf

1

1)(

)('

)('])[()(

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 173

Importance Sampling Weaknesses• Important to choose sampling

distribution with heavy tails– Not to “miss” large values of f

• Many-dimensional I-S:– “Typical set” of P may take a long time to

find, unless Q good approximation to P– Weights vary by factors exponential in N

• Similar for Likelihood Weighting

top related