07 approximate inference in bn
TRANSCRIPT
Bayesian Networks
Unit 7 Approximate Inference in Bayesian Networks
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Wang, Yuan-Kai, 王元凱[email protected]
http://www.ykwang.tw
Department of Electrical Engineering, Fu Jen Univ.輔仁大學電機工程系
2006~2011
Reference this document as: Wang, Yuan-Kai, “Approximate Inference in Bayesian Networks," Lecture Notes of Wang, Yuan-Kai, Fu Jen University, Taiwan, 2011.
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 2
Goal of This Unit• P(X|e) inference for Bayesian networks• Why approximate inference
– Exact inference is too slow because of exponential complexity
• Using approximate approaches– Sampling methods
• Likelihood weighting sampling• Markov Chain Monte Carlo sampling
– Loopy belief propagation– Variational method
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p.
Related Units• Background
– Probabilistic graphical model– Exact inference in BN
• Next units– Probabilistic inference over time
3
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 4
Self-Study References• Chapter 14, Artificial Intelligence-a modern
approach, 2nd, by S. Russel & P. Norvig, Prentice Hall, 2003.
• Inference in Bayesian networks, B. D’Ambrosio, AI Magazine, 1999.
• Probabilistic Inference in graphical models, M. I. Jordan & Y. Weiss.
• An introduction to MCMC for machine learning. Andrieu, C., De Freitas, J., Doucet, A., & Jordan, M. I., Machine Learning, vol. 50, pp.5-43, 2003.
• Computational Statistics Handbook with Matlab, W. L. Martinez and A. R. Martinez, Chapman & Hall/CRC, 2002– Chapter 3 Sampling Concepts– Chapter 4 Generating Random Variables
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 5
Structure of Related Lecture Notes
PGM Representation
Inference
Problem
Learning
Data
Unit 5 : BNUnit 9 : Hybrid BNUnits 10~15: Naïve Bayes, MRF,
HMM, DBN,Kalman filter
Unit 6: Exact inferenceUnit 7: Approximate inferenceUnit 8: Temporal inference
Units 16~ : MLE, EM
StructureLearning
ParameterLearning
B E
A
J M
P(A|B,E)P(J|A)P(M|A)
P(B)P(E)
Query
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p.
Contents
1. Sampling .......................................................... 112. Random Number Generator .......................... 203. Stochastic Simulation ……............................. 704. Markov Chain Monte Carlo .......................... 1135. Loopy Belief Propagation …………………. 1456. Variational Methods ………………………... 1467. Implementation …………………………….. 1478. Summary ……………………………………. 1489. References …………………………………… 151
6
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 7
4 Steps of Inference• Step 1: Bayesian theorem
• Step 2: Marginalization
• Step 3: Conditional independence
• Step 4: Product sum computation (Enumeration)– Exact inference–Approximate inference
),()(
),()|( eEXPeEP
eEXPeEXP
Hh
hHeEXP ),,(
Hh ni
ii XPaXP~1
))(|(
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 8
Five Types of Queries in Inference• For a probabilistic graphical model G• Given a set of evidence E=e• Query the PGM with
–P(e) : Likelihood query–arg max P(e) :
Maximum likelihood query–P(X|e) : Posterior belief query–arg maxx P(X=x|e) : (Single query variable)
Maximum a posterior (MAP) query–arg maxx1…xk
P(X1=x1, …, Xk=xk|e) :Most probable explanation (MPE) query
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 9
Approximate Inference v.s. Exact Inference
• Exact inference: P(X|E) = 0.71828– Get exact probability value– Using the inference steps derived by
probabilistic formula– Need exponential time complexity
• Approximate inference: P(X|E) 0.71– Get approximate probability value– Using sampling theorem– Need only polynomial time complexity,
fast computation
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 10
Why Approximate Inference• Large treewidth
– Large, highly connected graphical models– Treewidth may be large (>40) in sparse
networks • In many applications, approximation are
sufficient– Example: P(X = x|e) = 0.3183098861– Maybe P(X = x|e) 0.3 is a good enough
approximation– e.g., we take action only if P(X=x|e) > 0.5
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 11
1. Sampling
• 1.1 What Is Sampling• 1.2 Sampling for Inference
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 12
Basic Idea of Sampling• Why sampling
– Estimate some values by random number generation
1. Sampling– Random number generating– Draw N samples from a known distribution P– Generate N random numbers from a known
distribution S2. Estimation
– Compute an approximate probability , which approximates the real posterior probability P(X|E)
P̂
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 13
1.1 What Is Sampling• A very simple example with a random
variable : coin toss– Tossing the coin, get head or tail– It is a Boolean R.V.
• coin = head or tail– If it is unbiased coin, head and tail have
equal probability• A prior probability distribution
P(Coin) = <0.5, 0.5> • Uniform distribution
–Assume we have a coin but we do not know it is unbiased
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 14
Sampling of Coin Toss• Sampling in this example
= flipping the coin many times N– e.g., N=1000 times– One flipping get one sample– Ideally, 500 heads, 500 tails
• P(head) = 500/1000=0.5P(tail) = 500/1000=0.5
– Practically, 5001 heads, 499 tails• P(head) = 501/1000=0.501
P(tail) = 499/1000=0.499• After the sampling,
– We can estimate probability distribution– Check if it is biased
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 15
Sampling & Estimation (Math)• For a Boolean random variable X
– P(X) is prior distribution= <P(x), P(x)>
– Using a sampling algorithm to generate Nsamples
– Say N(x) is the number of samples that x is true, N(x) x is false
)(ˆ)( ),(ˆ)( xPN
xNxPN
xN
)()(lim ),()(lim xPN
xNxPN
xNNN
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 16
1.2 Sampling for Inference• Given a Bayesian network G including
(X1, …, Xn)– We get a joint probability distribution
P(X1, …, Xn) = P(Xi|Pa(Xi))• For a query P(X|E=e)
– P(X|e) = P(Xi | Parent(Xi)) – It is hard to compute
• Need exponential time in number of Xi– We will try to use sampling to compute it
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 17
Compute P(X|e) by Sampling• Sampling
– Generate N samples of P(X1, …, Xn) = P(Xi|Pa(Xi))
• Estimation– Use N samples to estimate
P(X,e) N(X,e)/N– Use N samples to estimate P(e) N(e)/N– Estimate P(X|e) by P(X,e) / P(e)
Explained in Sections 2,3,4
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 18
What Is Sampling Algorithm• The algorithm to
–Generate samples from a known probability distribution P
–Estimate the approximate probability P̂
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 19
Various Sampling Algorithms• Stochastic simulation
– Direct Sampling– Rejection sampling
• Reject samples disagreeing with evidence– Likelihood weighting
• Use evidence to weight samples• Markov chain Monte Carlo
(MCMC)– Sample from a stochastic process whose
stationary distribution is the true posterior
Section 3
Section 4
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 20
2. Random Number Generator
• Very important for sampling algorithm• Introduce basic concepts related to
sampling of Bayesian networks• Subsections
– 2.1 Univariate– 2.2 Multivariate
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 21
RNG In Programming Languages• Random number generator (RNG)
– C/C++: rand()– Java: random()– Matlab: rand()
• Why should we discuss it?– They generate random numbers with
uniform distribution– How to generate
• Gaussian, … • Multivariate, dependent random
variables • Non-closed-form distribution?
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 22
Generate a Random Number (1/2)• Examples in C
– int i = rand();– Return 0 ~ RAND_MAX (32767)– It generates integers
• Generate a random number between 1 and n (n<32767)– int i = 1 + ( rand() % n )– (rand() % n) returns a number between 0
and n - 1– Add 1 to make random number between 1
and n– It generates integers, but not real numbers
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 23
Generate a Random Number (2/2)• Ex: integer between 1 and 6–1 + ( rand() % 6)
• Ex: real number between 0 and 1–double i = rand() / RAND_MAX
• Exercise– Real number between 10 and 20
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 24
Generate Many Random Numbers Repeatedly
• Using loop for repeated generation– for (int i=0; i<1000; i++)
{ rand(); }– int i, j[1000];
for (i=0; i<1000; i++){ j[i] = 1 + rand() % 6; }
rand() generates a number uniformlyUniform distribution
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 25
Why Generate Random Numbers• Simulate random behavior• Make random decision• Estimate some values
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 26
Random Behavior/Decision (1/2) • Flip a coin for decision (Boolean)
– Fair: each face has equal probability – int coin_face;
if (rand() > RAND_MAX/2) coin_face = 1;
else coin_face = 0;– int coin_face;
coin_face = rand() % 2;
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 27
Random Behavior/Decision (2/2)• Random decision of multiple choices
– Discrete random variable• Ex: roll a die
–Fair: each face has equal probability• int die_face; //Random variable
die_face = rand() % 6;
Uniform distribution
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 28
Estimation• If we can simulate a random behavior• We can estimate some values
– First, we repeat the random behavior– Then we estimate the value
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 29
Example: The Coin Toss
• Flip the coin 1000 times to estimate the fairness of the coin– int coin_face; //Random variable
int frequency[2];for (i=0; i<1000; i++){ coin_face = rand() % 2
frequency[coin_face]++;}
Coinface
0 1
frequencyUniform distribution
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 30
Example : Area of Circle (Estimation)• int x, y; //Two random variables
int N=1000, NCircle=0, Area;for (i=0; i<N; i++){ x = rand() / RAND_MAX;
y = rand() / RAND_MAX;if ( (x*x + y*y) <= 1 )
NCircle = NCircle + 1;}Area = 4 * (NCircle/N);
A random number ?
x and y are independent
We call (x,y) a sample
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 31
Multiple Dependent Random Variables• Markov Chain: n random variables
• Bayesian Networks: 5 random variables
X1 Xk Xn......
Burglary Earthquake
Alarm
John Calls Mary Calls
Variables are dependent
What is a sample ?
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 32
Sampling• It is to randomly generate a sample
– For a random variable X orA set of random variables X1, …, Xn• Boolean, Discrete, Continuous• Multivariate
– Independent, dependent– According to a probability distribution P(X)
• Discrete X: Histogram• Continuous X:
– Uniform, Gaussian, or – Any distribution: Gaussian mixture models
UnivariateMultivariate
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 33
Sub-Sections for Generating a Sample
• 2.1 Univariate– Uniform, Gaussian, Gaussian mixture
• 2.2 Multivariate– Uniform– Gaussian
• Independent, dependent– Any distribution
• Gaussian mixture– Independent, dependent
• Bayesian network
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 34
2.1 Univariate• For a random variable X
– Boolean, discrete, continuous, hybrid• We know P(X) is
– Uniform, Gaussian, Gaussian mixture• Generate a sample X according to P(X)
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 35
Uniform Generator• Every programming language provides
a rand()/random() function to generate a uniform-distributed number– Integer number within [0, MAX)
• Sampling a Boolean uniform number– rand() %2
• Sampling a discrete uniform number within [0, d)– rand() % d
• Sampling a continuous uniform number– Within [0, 1): rand() % MAX– Within [a, b): a + (rand() % MAX)*(a-b)
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 36
Example : Uniform Generator• x=rand(1,10000);• h=hist(x,20);• bar(h);
0 5 10 15 20 250
100
200
300
400
500
600
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 37
Gaussian Generator (1/2)• Sampling Gaussian can be obtained by
uniform distribution• There are functions in C/Java/Matlab to
randomly generate a univariate Gaussian real number with (, )=(0,1)– C : Numerical recipies in C, – Java: Random.nextGaussian()– Matlab: randn()
• Suppose it is called Gaussian()
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 38
Gaussian Generator (2/2)• Sampling a continuous Gaussian
number with (, )– (Gaussian() * ) +
• Sampling a discrete Gaussian number with (, ) ?
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 39
Example : Gaussian Generator (1/2)• Pseudo codes
– Assume Gaussian() is a pseudo function to generate Gaussian numbers
– double x[10000]; for (i=0; i<10000; i++)
x[i] = Gaussian();– for (i=0; i<10000; i++)
x[i] = + Gaussian() * ;
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 40
Example : Gaussian Generator (2/2)• Matlab
– x=randn(1,10000);– h=hist(x,20);– bar(h);
• Java– Random r=new
Random();int x[10000];for (i=0;i<10000;i++)x[i]=r.nextGaussian();
0 5 10 15 20 250
200
400
600
800
1000
1200
1400
1600
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 41
Gaussian Mixture Generator (1/2)• Random variable X with Gaussian
– P(X) = N(X; , )• Random variable Y with Gaussian
mixture – P(Y) = m mN(Y; m, m)
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 42
Gaussian Mixture Generator (2/2)• Generate N samples of X
– for (i=0; i<N; i++)x[i]=(Gaussian() * ) +
• Generate N samples of Y with mixture of M Gaussians– Each Gaussian m has m, m– for (m=0; m<M; m++)
for (i=0; i<N*m; i++)y[m][i] = (Gaussian() * m) + m
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 43
Example : Gaussian Mixture Generator
• N=10000; pi1=0.8; pi2=0.2;• mu1=0; mu2=15; sigma1=3; sigma2=5;• x1 = mu1 + randn(1,N*pi1) * sigma1;• x2 = mu2 + randn(1,N*pi2) * sigma2;• x = [x1, x2];• h=hist(x,50);• bar(h);
0 10 20 30 40 50 600
100
200
300
400
500
600
700
800
900
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 44
2.2 Multivariate• For random variables X1,… ,Xn
– Boolean, discrete, continuous, hybrid• We know P(X1,… ,Xn) is
– Uniform, Gaussian, Gaussian mixture, any distribution
• Generate a sample (X1,… ,Xn) according to P(X1,… ,Xn)– Independent– Dependent
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 45
Multivariate Boolean Uniform Generator
• Boolean random variables X1,… ,Xn• int X[n]; // A sample
for (i=0; i<n; i++)X[i] = rand() % 2;
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 46
Multivariate Discrete Uniform Generator
• Discrete random variables X1,…, Xn– Each with d discrete values: [0, d-1]– Each Xi is uniform distributed– X1,…, Xn must be independent
• int X[n]; // A samplefor (i=0; i<n; i++)
X[i] = rand() % d;
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 47
Multivariate Gaussian Generator - Independent (1/2)
• Pseudo codes• For n random variables X=(X1,…,Xn)
– Gaussian : N(X; , ) • Mean vector: • Covariance matrix: =[ij]
• X1,…,Xn are independent– ij = 0 for ij
• Generate a sample of X Generate each Xi independently
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 48
Multivariate Gaussian Generator - Independent (2/2)
• Generate a sample of X =(X1,…,Xn) with i=0, ii=1, ij = 0 for ij– int X[n]; // a sample
for (i=0; i<n; i++)X[i] = Gaussian();
• Generate a sample of X =(X1,…,Xn) with i0, ii 1, ij = 0 for ij– int X[n]; // a sample
for (i=0; i<n; i++)X[i] = i + Gaussian() * ii;
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 49
Example – Matlab (1/2)mx=[0 0]';Cx=[1 0; 0 1];x1=-3:0.1:3;x2=-3:0.1:3;for i=1:length(x1),
for j=1:length(x2),
f(i,j)=(1/(2*pi*det(Cx)^1/2))*exp((-1/2)*([x1(i) x2(j)]-mx')*inv(Cx)*([x1(i);x2(j)]-mx));
endendmesh(x1,x2,f)pause;contour(x1,x2,f)pause
1001
)0,0(
X
TX
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 50
Example – Matlab (2/2)• Randomly generate 1000 samples for
y1=randn(1,1000);y2=randn(1,1000);plot(y1,y2,'.');
1001
,)0,0( XT
X
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 51
Multivariate Gaussian Generator - Dependent (1/4)
• For n random variables X=(X1,…,Xn)–Gaussian : N(X; , )
• Mean vector: • Covariance matrix: =[ij]
– is a positive definite matrix• Symmetric and all eigenvalues (pivots) > 0
– For general matrix A : A= LDU• L: lower triangular, U: upper triangular
D: diagonal matrix of pivots– For symmetric matrix S: S = LDLT
– For positive definite matrix = LDLT =– This is called Cholesky decomposition
• X1,…,Xn are dependent–ij 0
T TL D L D PP
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 52
Multivariate Gaussian Generator - Dependent (2/4)
• Generate a sample of X with , – Perform Cholesky decomposition of
• Cholesky decomposition is pivot decompositionfor positive definite matrix
• = PP-1 = PPT
– Generate independent Gaussian Y=(Y1,…,Yn )with i=0, i=1
– X = PY +
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 53
Multivariate Gaussian Generator - Dependent (3/4)
• Pseudo code to generate a sample of Xwith , – Matrix ;
Vector ;Vector X(n), Y(n); // a sample
Matrix P=chol(); //Cholesky decomp. for (i=0; i<n; i++) Y(i) = Gaussian();X = P * Y +
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 54
Multivariate Gaussian Generator - Dependent (4/4)
• Proof– For n random variables X=(X1,…,Xn) with , – Generate n independent, zero-mean, unit variance
normal random variables Y=(Y1,…,Yn)
,)0,,0(,),,( 1T
YT
nYYY
10
01
Y
– Take X = PY+, where =PP-1 =PPT
TTTTTT
T
PPPYYPEPPYYEPYPYEXXEX
}{}{}))({())(( ofMatrix Covariance
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 55
Example – Matlab (1/4)
mx=[0 0]';Cx=[1 1/2; 1/2 1];P=chol(Cx);
2/12/1
01,
12/12/11
)0,0(
23
PX
TX
Assume
Matlab:
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 56
Example – Matlab (2/4)• Randomly generate 1000 samples for
• mx=zeros(2,1000);y1=randn(1,1000);y2=randn(1,1000);y=[y1;y2];P=[1, 0; 1/2, sqrt(3)/2];x=P*y+mx;x1=x(1,:);x2=x(2,:);plot(x1,x2,'.');r=corrcoef(x1',x2');
12/12/11
,)0,0( XT
X
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 57
Example – Matlab (3/4)
• mx=[5 5]';• Cx=[1 9/10; 9/10 1];• P=chol(Cx);
9.0
01,
19.09.01
)5,5(
1019
109
PX
TX
Assume
Matlab:
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 58
Example – Matlab (4/4)• Randomly generate 1000 samples for
• mx=5*ones(2,1000);y1=randn(1,1000);y2=randn(1,1000);y=[y1;y2];P=[1, 0; 9/10, sqrt(19)/10];x=P*y+mx;x1=x(1,:);x2=x(2,:);plot(x1,x2,'.');r=corrcoef(x1',x2');
19.09.01
,)5,5( XT
X
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 59
Multivariate Gaussian Mixture Generator
• Generate N samples of X with mixture of MGaussians (Matlab-like pseudo code)– for (m=0; m<M; m++)
{ Matrix P=chol(m) //Cholesky decomposition for (i=0; i<N*m; i++){ //Generate n independent normally distributed
// R.V. (=0, =1) y = randn(1, n)// Transform y into x x = P * y +
}}
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 60
Example – Matlab (1/4)• Combine the previous two Gaussians:1=0.5, 2=0.5,
-4 -2 0 2 4 6 8 10-3
-2
-1
0
1
2
3
4
5
6
7
12/12/11
)0,0(
1
1T
19.09.01
)5,5(
2
2T
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 61
Example – Matlab (2/4)• pi1= 0.5; pi2=0.5; N=2000;
mx1=zeros(2,pi1*N); Cx1=[1 1/2; 1/2 1];P1=chol(Cx1); %P=[1, 0; 1/2, sqrt(3)/2];y1_1=randn(1,pi1*N); y1_2=randn(1,pi1*N);y1=[y1_1;y1_2];
x1=P1*y1+mx1; x1_1=x1(1,:); x1_2=x1(2,:);
mx2=5*ones(2,pi2*N); Cx2=[1 9/10; 9/10 1];P2=chol(Cx2); %P=[1, 0; 1/2, sqrt(3)/2];y2_1=randn(1,pi2*N); y2_2=randn(1,pi2*N);y2=[y2_1;y2_2];x2=P2*y2+mx2; x2_1=x2(1,:); x2_2=x2(2,:);
z1=[x1_1,x2_1]; z2=[x1_2,x2_2];plot(z1,z2,'.');
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 62
Example – Matlab (3/4)• Combine the previous two Gaussians1=0.2, 2=0.8
-4 -2 0 2 4 6 8 10-3
-2
-1
0
1
2
3
4
5
6
7
12/12/11
)0,0(
1
1T
19.09.01
)5,5(
2
2T
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 63
Example – Matlab (4/4)• pi1= 0.2; pi2=0.8; N=2000;
mx1=zeros(2,pi1*N); Cx1=[1 1/2; 1/2 1];P1=chol(Cx1); %P=[1, 0; 1/2, sqrt(3)/2];y1_1=randn(1,pi1*N); y1_2=randn(1,pi1*N);y1=[y1_1;y1_2];
x1=P1*y1+mx1; x1_1=x1(1,:); x1_2=x1(2,:);
mx2=5*ones(2,pi2*N); Cx2=[1 9/10; 9/10 1];P2=chol(Cx2); %P=[1, 0; 1/2, sqrt(3)/2];y2_1=randn(1,pi2*N); y2_2=randn(1,pi2*N);y2=[y2_1;y2_2];x2=P2*y2+mx2; x2_1=x2(1,:); x2_2=x2(2,:);
z1=[x1_1,x2_1]; z2=[x1_2,x2_2];plot(z1,z2,'.');
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 64
Exercise• Write a program to randomly generate
1000 samples of 3-dimensional Gaussian with =(5,10,-3), =(2,1,3;4,2,2;3,1,2)
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 65
Any Distribution• For random variables X1,… ,Xn
– Boolean, discrete, continuous, hybrid• We know P(X1,… ,Xn) has no closed-form
formula– Independent: P(X1,… ,Xn)= P(X1)… P(Xn) – Dependent:
P(X1,… ,Xn)= P(Xi | Parent(Xi))• Generate a sample (X1,… ,Xn) according to
P(X1,… ,Xn)– Independent: generate each Xi by P(Xi) – Dependent: generate each Xi by P(Xi| Parent(Xi))
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 66
Two Boolean R.V.s - Independent• X1, X2 have distributions :
– P(X1)=<0.67, 0.33>, P(X2)=<0.75,0.25>• int X1, X2;
for (i=0; i<1000; i++){ if (rand() > RAND_MAX/3)
X1 = 1;else X1 = 0;if (rand() > RAND_MAX/4)
X2 = 1;else X2 = 0;
}
X10 1
P(X1)0.67
X20 1
P(X2)0.75
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 67
Two Boolean R.V.s - Dependent• X1, X2 have distributions :
– P(X1)=<0.67, 0.33>– P(X2|X1=T)=<0.75,0.25>, P(X2|X1=F)=<0.8,0.2>
• Generate a sample (x1, x2)if (rand() > RAND_MAX/3) x1 = 1;else x1 = 0;if (x1==1)
if (rand() > RAND_MAX/4) x2 = 1;else x2 = 0;
else // x1==0if (rand() > RAND_MAX/5) x2 = 1;else x2 = 0;
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 68
Markov Chain• Markov Chain: n random variables
X1 Xk Xn......
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 69
Bayesian Network• Example: 5 random variables
Burglary Earthquake
Alarm
John Calls Mary Calls
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 70
3. Stochastic Simulation
• Also called–Monte Carlo Methods–Sampling Methods
• Sub-sections–3.1 Direct sampling–3.2 Rejection sampling–3.3 Likelihood weighting
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 71
3.1 Direct Sampling• Generate N samples randomly• For the inference P(X|E)
–P(X|E)= P(X^E) / P(E)–Get N(E) & N(X^E) from the N
samples• N(E) : No. of samples of E• N(X^E) : No. of samples of X and E
–P(E) = N(E) / N, P(X^E) = N(X^E) / N
–P(X|E) = N(X^E) / N(E)
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 72
Example (1/4)• For the sprinkler network
–Estimate P(w|r)by direct sampling
–4 random variables–A sample =
(c,s,r,w)
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 73
Example (2/4)• Generate 1000 samples
Cloudy Sprinkler Rain WetGrass
T T T FF T T FF F T TT T T FT T T F... ... ... ...F T T F
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 74
Example (3/4)• P(r| w) = P(r, w)/P(w)
Cloudy Sprinkler Rain WetGrass
T T T FF T T FF F T TT T F F... ... ... ...F T T F
Nw: No. of WetGrass=FalseNr^w: No. of (Rain=True&WetGrass=False)
Nr^w / Nw
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 75
Example (4/4)• P(R|w)
– = P(R, w)/P(w)– = < P(r ^ w)/P(w), P(r ^ w)/P(w) >
Cloudy Sprinkler Rain WetGrass
T T T FF T T FF F T TT T F F... ... ... ...F T T F
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 76
How to Generate a Sample for the Bayesian Network? (1/3)
• The sprinkler Bayesian network
•Assume a sampling order:[ Cloudy, Sprinkler,
Rain, WetGrass ]
A sample is an atomic event :(cloundy,sprinkler,rain,wetgrass)=(T, F, T, T)
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 77
How to Generate a Sample for the Bayesian Network? (2/3)
• int C, S, R, W;for (i=0; i<1000; i++){ if (rand() > RAND_MAX/2) C = T;
else C = F;if (rand() > RAND_MAX/2) S = T;
else S = F; if (rand() > RAND_MAX/2) R = T;
else R = F; if (rand() > RAND_MAX/2) W = T;
else W = F; } Incorrect
Implementation
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 78
How to Generate a Sample for the Bayesian Network? (3/3)
• int C, S, R, W;for (i=0; i<1000; i++){ if (rand() > RAND_MAX/2) C = T;
else C = F;if (C==T)
if (rand() > RAND_MAX*0.9) S = T;
else S = F; else // C==F
if (rand() > RAND_MAX/2) S = T;
else S = F;...
}
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 79
An Example Generating One Sample (1/8)
• The sampling algorithm1.Sample from P(Cloudy)=<0.5, 0.5>
– Suppose it returns true2.Sample from
P(Sprinkler|Cloudy=true)=<0.1,0.9>– Suppose it returns false
3.Sample from P(Rain|Cloudy=true)=<0.8,0.2>– Suppose it returns true
4.Sample from P(WetGrass|Sprinkler=false, Rain=true) = <0.9,0.1>– Suppose it returns true
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 80
An Example Generating One Sample (2/8)
Samples:
C S R W
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 81
An Example Generating One Sample (3/8)
Random sampling: Cloudy
Return: Cloudy=trueSamples:
C S R Wc
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 82
An Example Generating One Sample (4/8)
Random sampling1. Sprinkler2. RainGiven Cloudy=true
Samples:
C S R Wc
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 83
An Example Generating One Sample (5/8)
Random samplingSprinklerGiven Cloudy=true
Return: Sprinkler=false
Samples:
C S R Wc s
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 84
An Example Generating One Sample (6/8)
Random sampling RainGiven Cloudy=true
Return: Rain=true
Samples:
C S R Wc s r
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 85
An Example Generating One Sample (7/8)
Random sampling WetGrassGiven Rain=true,
Sprinkler=false
Samples:
C S R Wc s r
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 86
An Example Generating One Sample (8/8)
Random sampling WetGrassGiven Rain=true,
Sprinkler=false
Return: WetGrass=true
Samples:
C S R Wc s r w
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 87
The Algorithm (1/2)• To generate one sample
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 88
The Algorithm (2/2)• In previous example
–We get a sample [true, false, true, true] of a Bayesian network using the Prior-Sample
• The sampling of a Bayesian network–Repeat the sampling N times–We get N samples
• We can use the N samples to compute any query probability in the Bayesian network
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 89
How It Works (1/2)• Why any probability can be
answered from the sampling?–The N samples is actually a full joint
distribution table (FJD)C S R WT T T FF T T FF F T TT T F F... ... ... ...F T T F
C S R W PT T T F 0.02F T T F 0.13F F T T 0.04T T F F 0.15... ... ... ... ...
FJD
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 90
Why It Works (2/2)• A sample is an atomic event (x1, ..., xn)• P(x1, ..., xn) N(x1, ..., xn) / N• Therefore, a FJD is generated from
the N samples• Note: N < 2n
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 91
Exercise: Direct Sampling
smart study
prepared fair
pass
p(smart)=.8 p(study)=.6
p(fair)=.9
p(prep|…) smart smartstudy .9 .7study .5 .1
p(pass|…)smart smart
prep prep prep prepfair .9 .7 .7 .2fair .1 .1 .1 .1
Query: What is the probability that a student studied, given that they pass the exam?
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 92
Problems of Direct Sampling• It needs to generate very many
samples in order to obtain the approximate FJD
• For a query of conditional probability P(X|e)–Can we just approximate the
conditional probability?–Yes, the following two algorithms will
do this
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 93
3.2 Rejection Sampling• is estimated from samples
agreeing with e)|(ˆ eXP
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 94
An Example• Estimate P(Rain|Sprinkler=true)
using 100 samples–27 samples have Sprinkler = true–Of these, 8 have Rain=true and
19 have Rain=false–P(Rain|Sprinkler=true) =
Normalize(<8,19>) = <0.296, 0.704>• Similar to a basic real-world
empirical estimation procedure
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 95
Analysis of Rejection Sampling
• Hence rejection sampling returns consistent posterior estimates
• Problem: expensive if P(e) is small–P(e) drops off exponentially with
number of evidence variables!
)|()|(ˆ)(
),()(
),( eXPeXP ePeXP
eNeXN
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 96
3.3 Likelihood Weighting• Avoids the inefficiency of rejection
sampling –By generating only events consistent
with the evidence variables e• Idea
–Fix evidence variables,–Sample only hidden variables–Weight each sample event by the
likelihood it accords the evidence• Events have different weights
Randomly generatea sample event
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 97
An Example (1/9)• Query P(Rain|sprinkler, wetgrass)
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 98
An Example (2/9)1. Set the weight =1.02. Sample from P(Cloudy)=<0.5,0.5>
• Suppose it returns true3. The evidence Sprinkler=true. So we set
= P(sprinkler|cloudy)=1*0.1=0.14. Sample from P(Rain|cloudy)=<0.8,0.2>
• Suppose it returns true5. The evidence WetGrass=true. So we set
= P(wetgrass|sprinkler,rain) =0.1*0.99=0.099
A sample event (true, true, true, true) with weight 0.099
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 99
An Example (3/9)
=1.0
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 100
An Example (4/9)
=1.0
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 101
An Example (5/9)
=1.0
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 102
An Example (6/9)
=1.0 0.1
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 103
An Example (7/9)
=1.0 0.1
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 104
An Example (8/9)
=1.0 0.1
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 105
An Example (9/9)
=1.0 0.1 0.99= 0.099
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 106
The Algorithm (1/2)• The example generates a sample
event (true, true, true, true) for the query P(Rain|sprinkler, wetgrass)
• Repeat the sampling N times–We get N sample events–Each event has a likelihood weight –1 = rain=true , 1 = rain=false
• P(Rain|sprinkler, wetgrass) = < 1/(1+2), 2/(1+2) >
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 107
The Algorithm (2/2)
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 108
Exercise: Likelihood Weighting
smart study
prepared fair
pass
p(smart)=.8 p(study)=.6
p(fair)=.9
p(prep|…) smart smartstudy .9 .7study .5 .1
p(pass|…)smart smart
prep prep prep prepfair .9 .7 .7 .2fair .1 .1 .1 .1
Query: What is the probability that a student studied, given that they pass the exam?
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 109
Analysis (1/3)• Why the algorithm works? P(X|E=e)• Let the sampling probability for
WEIGHTED-SAMPLE be SWS–The evidence variables E are fixed
with e–All the other variables Z = {X} Y–The algorithm samples each variable
in Z given its parent values
l
iiiWS ZparentszPezS
1
))(|(),(
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 110
Analysis (2/3)• The likelihood weight w for a given
sample (z, e)=(x, y, e) is
• The weighted probability of a sample (z,e)=(x, y, e) is
m
iii EparentsePezw
1
))(|(),(
),,(
))(|())(|(
),(),(
11eyxP
EparentsePZparentszP
ezwezSm
iii
l
iii
WS
n
iiin XparentsxPxxP
11 ))(|(),,(
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 111
Analysis (3/3)
y
WS eyxweyxNexP ),,(),,()|(ˆ
y
WS eyxweyxS ),,(),,('
)|(),(' exPexP
y
eyxP ),,('
So the algorithm works
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 112
Discussions• Likelihood weighting is efficient
because it uses all the samples generated
• However, it suffers a degradation in performance as the no. of evidence variables increases, because –Most samples will have very low weights,–The weighted estimate will be dominated
by the tiny fraction of samples that have infinitesimal likelihood
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 113
4. Inference by MCMC• Key idea
– Sampling process as a Markov Chain• Next sample depends on the previous one
– Approximate any posterior distribution• "State" of network
= current assignment to all variables• Generate next state
– by sampling one variable given Markov blanket
• Sample each variable in turn, keeping evidence fixed
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 114
The Markov Chain• With Sprinkler =true, WetGrass=true,
there are four states:
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 115
Markov Blanket Sampling• Markov blanket of Cloudy is
–Sprinkler and Rain• Markov blanket of Rain is
–Cloudy, Sprinkler, and WetGrass• Probability given the Markov
blanket is calculated as follows–P(x'i|MB(Xi))
= P(x'i|Parents(Xi))ZjChildren(Xi)P(zj|Parents(Zj))
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 116
An Example (1/2)• Estimate P(Rain|sprinkler,wetgrass)• Loop for N times
–Sample Cloudy or Rain given its Markov blanket
• Count number of times Rain=trueand Rain=false in the samples
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 117
An Example (2/2)• E.g., visit 100 states
–31 have Rain=true, –69 have Rain=false
• P(Rain|sprinkler,wetgrass)= Normalize(<31, 69>) = <0.31, 0.69>
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 118
The Algorithm
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 119
Why it works• Skipped
–Details in pp. 517-518 in the AIMA 2e textbook
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 120
Sub-Sections• 4.1 Markov chain theory• 4.2 Two MCMC sampling algorithms
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 121
4.1 Markov Chain Theory• Suppose X1, X2, … take some set of values
– wlog. These values are 1, 2, ...• A Markov chain is a process that corresponds
to the network:
• To quantify the chain, we need to specify– Initial probability: P(X1)– Transition probability: P(Xt+1|Xt)
• A Markov chain has stationary transition probability: P(Xt+1|Xt) same for all times t
X1 X2 X3 Xn... ...
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 122
Irreducible Chains
• A state j is accessible from state i if there is an n such that P(Xn = j | X1 = i) > 0– There is a positive probability of reaching j from i after some number steps
• A chain is irreducible if every state is accessible from every state
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 123
Ergodic Chains• A state is positively recurrent if there is a
finite expected time to get back to state iafter being in state i– If X has finite number of states, then this is
suffices that i is accessible from itself
• A chain is ergodic if it is irreducible and every state is positively recurrent
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 124
(A)periodic Chains• A state i is periodic if there is an integer d such that when n is not divisible by d
P(Xn = i | X1 = i ) = 0• Intuition: only every d steps state i may
occur • A chain is aperiodic if it contains no
periodic state
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 125
Stationary ProbabilitiesThm:• If a chain is ergodic and aperiodic, then
the limitexists, and does not depend on i
• Moreover, letthen, P*(X) is the unique probability satisfying
)|(lim 1 iXXP nn
)|(lim)( 1* iXjXPjXP nn
i
tt iXPiXjXPjXP )()|()( *1
*
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 126
Stationary Probabilities• The probability P*(X) is the stationary
probability of the process• Regardless of the starting point, the
process will converge to this probability
• The rate of convergence depends on properties of the transition probability
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 127
Sampling from the Stationary Probability
• This theory suggests how to sample from the stationary probability:– Set X1 = i, for some random/arbitrary i– For t = 1, 2, …, n
•Sample a value xt+1 for Xt+1 from P(Xt+1|Xt=xt)
– return xn• If n is large enough, then this is a sample
from P*(X)
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 128
Designing Markov Chains• How do we construct the right chain to
sample from?– Ensuring aperiodicity and irreducibility is
usually easy
• Problem is ensuring the desired stationary probability
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 129
Designing Markov ChainsKey tool:• If the transition probability satisfies
then, P*(X) = Q(X)• This gives a local criteria for checking
that the chain will have the right stationary distribution
0)|1(whenever)()(
)|()|(
1
1
itXjtXPiXQjXQ
jXiXPiXjXP
tt
tt
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 130
MCMC Methods• We can use these results to sample from P(X1,…,Xn|e)
Idea:• Construct an ergodic & aperiodic
Markov Chain such that P*(X1,…,Xn) = P(X1,…,Xn|e)
• Simulate the chain n steps to get a sample
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 131
MCMC MethodsNotes:• The Markov chain variable Y takes as
value assignments to all variables that are consistent evidence
• For simplicity, we will denote such a state using the vector of variables
}satisfy,...,|)()(,...,{)( 1111 enn xxXVXVxxYV
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 132
4.2 Two MCMC Sampling Algorithms
• Gibbs Sampler• Metropolis-Hastings Sampler
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 133
Gibbs Sampler• One of the simplest MCMC method• Each transition changes the state of one Xi
• The transition probability defined by P itself as a stochastic procedure:– Input: a state x1,…,xn– Choose i at random (uniform probability)– Sample x’i from P(Xi|x1, …, xi-1, xi+1 ,…, xn, e)
– let x’j = xj for all j i– return x’1,…,x’n
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 134
Correctness of Gibbs Sampler• How do we show correctness?
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 135
Correctness of Gibbs Sampler• By chain ruleP(x1,…,xi-1, xi, xi+1,…,xn|e) =P(x1,…,xi-1, xi+1,…,xn|e)P(xi|x1,…,xi-1, xi+1,…,xn, e)
• Thus, we get
• Since we choose i from the same distribution at each stage, this procedure satisfies the ratio criteria
),,,,,,|'(),,,,,,|(
)|,,,',,,()|,,,,,,(
111
111
111
111ee
ee
niii
niii
niii
niiixxxxxPxxxxxP
xxxxxPxxxxxP
Transition
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 136
Gibbs Sampling for Bayesian Network
• Why is the Gibbs sampler “easy” in BNs?• Recall that the Markov blanket of a
variable separates it from the other variables in the network– P(Xi | X1,…,Xi-1,Xi+1,…,Xn) = P(Xi | Mbi )
• This property allows us to use localcomputations to perform sampling in each transition
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 137
Gibbs Sampling in Bayesian Networks
• How do we evaluate P(Xi | x1,…,xi-1,xi+1,…,xn) ?
• Let Y1, …, Yk be the children of Xi– By definition of Mbi, the parents of Yj are
in Mbi{Xi}• It is easy to show that
i
j
j
x jyjii
jyjii
ii payPPaxP
payPPaxPMbxP
')|()|'(
)|()|()|(
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 138
Metropolis-Hastings• More general than Gibbs (Gibbs is a
special case of M-H)• Proposal distribution arbitrary q(x’|x)
that is ergodic and aperiodic (e.g., uniform)
• Transition to x’ happens with probability(x’|x)=min(1, P(x’)q(x|x’)/P(x)q(x’|x))
• Useful when computing P(x) infeasible• q(x’|x)=0 implies P(x’)=0 or q(x|x’)=0
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 139
Sampling Strategy• How do we collect the samples?Strategy I:• Run the chain M times, each for N steps
– each run starts from a different state points
• Return the last state in each run
M chains
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 140
Sampling StrategyStrategy II:• Run one chain for a long time• After some “burn in” period, sample
points every some fixed number of steps
“burn in” M samples from one chain
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 141
Comparing StrategiesStrategy I:
– Better chance of “covering” the space of pointsespecially if the chain is slow to reach stationarity
– Have to perform “burn in” steps for each chainStrategy II:
– Perform “burn in” only once– Samples might be correlated (although only weakly)
Hybrid strategy: – Run several chains, sample few times each– Combines benefits of both strategies
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 142
Short Summary -Approximate Inference
• Monte Carlo (sampling with positive and negative error) Methods:– Pos: Simplicity of implementation and
theoretical guarantee of convergence– Neg: Can be slow to converge and hard to
diagnose their convergence.• Variational Methods – Your presentation• Loopy Belief Propagation and Generalized
Belief Propagation -- Your presentation
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 143
Exercise: MCMC Sampling
smart study
prepared fair
pass
p(smart)=.8 p(study)=.6
p(fair)=.9
p(prep|…) smart smartstudy .9 .7study .5 .1
p(pass|…)smart smart
prep prep prep prepfair .9 .7 .7 .2fair .1 .1 .1 .1
Query: What is the probability that a student studied, given that they pass the exam?
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 144
Main Computational Problems1. Difficult to tell if convergence has
been achieved2. Can be wasteful if Markov
blanket is large– P(Xi|MB(Xi)) won't change much
(law of large numbers)
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 145
5. Loopy Belief Propagation• TBU
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 146
6. Variational Methods• TBU
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 147
7. Implementation by PNL
PNL GeNIeEnumeration v (Naïve)Variable EliminationBelief Propagation v (Pearl) v (Polytree)Junction Tree v v (Clustering)Direct Sampling v (Logic)Likelihood Sampling v(LWSampling) v(Likelihood
sampling)MCMC Sampling v(Gibbswithanneal) (Other 5 samplings)
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 148
8. Summary• Exact inference by variable
elimination–Polytime on polytrees–NP-hard on general graphs–Space = time, very sensitive to
topology
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 149
Summary• Approximate inference by LW,
MCMC–LW does poorly when there is lots of
(downstream) evidence–LW, MCMC generally insensitive to
topology–Convergence can be very slow with
probabilities close to 1 or 0–Can handle arbitrary combinations of
discrete and continuous variables
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 150
Summary• What we know
–What is a Bayesian network–How to inference, given a Bayesian
network• However, we still need to know
–How to learn CPTs–How to build or automatically learn
the structure of a Bayesian network by given a set of data
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 151
9. References• General Introduction to Probabilistic Inference in
BN– B. D’Ambrosio, Inference in Bayesian networks, AI
Magazine, 1999.– M. I. Jordan & Y. Weiss, Probabilistic Inference in
graphical models,.– Andrieu, C., De Freitas, J., Doucet, A., & Jordan, M. I.
(in press). An introduction to MCMC for machine learning. Machine Learning, vol. 50, pp.5-43, 2003..
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 152
Recent Books• R. E. Neapolitan, Learning Bayesian Networks,
Prentice Hall, 2004.• C. Borgelt and R. Kruse, Graphical Models:
methods for data analysis and mining, Wiley, 2002.• D. Edwards, Introduction to Graphical Modelling,
2nd, Springer, 2000.• S. L. Lauritzen, Graphical Models, Oxford, 1996.• M. I. Jordan (ed.), Learning in Graphical Models,
MIT, 2001.
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 153
Appendix• Theoretical analysis of approximation
error
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 154
Types of ApproximationsAbsolute error• An estimate q of P(X=x|e) has
absolute error , ifP(X=x|e) - q P(X=x|e) +
equivalentlyq - P(X = x|e) q +
• Not always what we want: error 0.001– Unacceptable if P(X = x | e) = 0.0001,– Overly precise if P(X = x | e) = 0.3
0
1
q2
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 155
Types of ApproximationsRelative error• An estimate q of P(X=x|e)
has relative error , ifP(X=x|e)(1-) q P(X=x|e)(1+)equivalently
q/(1+) P(X=x|e) q/(1-)• Sensitivity of approximation
depends on actual value of desired result 0
1
q
q/(1+)
q/(1-)
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 156
Complexity• Recall, exact inference is NP-hard• Is approximate inference any easier?
• Construction for exact inference:– Input: a 3-SAT problem – Output: a BN such that P(X=t) > 0 iff is
satisfiable
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 157
Complexity: Relative Error• Suppose that q is a relative error
estimate of P(X = t), • If is not satisfiable, then
P(X = t)(1 - ) q P(X = t)(1 + )0 = P(X = t)(1 - ) q P(X = t)(1 + ) = 0Thus, if q > 0, then is satisfiable
An immediate consequence:
Thm: Given , finding an -relative error approximation is NP-hard
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 158
Complexity: Absolute error• Thm: If < 0.5, then finding an
estimate of P(X=x|e) with absulote error approximation is NP-Hard
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 159
Likelihood Weighting• Can we ensure that all of our sample
satisfy e?• One simple solution:
–When we need to sample a variable that is assigned value by e, use the specified value
• For example: we know Y = 1–Sample X from P(X)–Then take Y = 1
• Is this a sample from P(X,Y |Y = 1) ?
X Y
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 160
Likelihood Weighting• Problem: these samples of X from P(X)• Solution:
– Penalize samples in which P(Y=1|X) is small
• We now sample as follows:– Let x[i] be a sample from P(X)– Let w[i] be P(Y = 1|X = x [i])
X Y
i
i
iw
[i])x|XPiw)xXP
][
(][1|(
xY
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 161
Likelihood Weighting• Why does this make sense?• When N is large, we expect to sample NP(X = x) samples with x[i] = x
• Thus,
• When we normalize, we get approximation of the conditional probability
)1,(
)|1()(][,
YxXNP
xXYPxXNPwxixi
i
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 162
Samples:
B E A C R
Likelihood WeightingP(b) 0.03
P(e) 0.001
P(a)b e b e b e b e0.98 0.40.7 0.01
P(c)a
0.8 0.05
P(r)e e
0.3 0.001
b
Earthquake
Radio
Burglary
Alarm
Call
0.03
Weight
= r
a
= a
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 163
Samples:
B E A C R
Likelihood WeightingP(b) 0.03
P(e) 0.001
P(a)
b e b e b e b e0.98 0.40.7 0.01
P(c)
a a0.8 0.05
P(r)
e e0.3 0.001
eb
Earthquake
Radio
Burglary
Alarm
Call
0.001
Weight
= r = a
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 164
Samples:
B E A C R
Likelihood WeightingP(b) 0.03
P(e) 0.001
P(a)
b e b e b e b e0.98 0.40.7 0.01
P(c)
a a0.8 0.05
P(r)e e
0.3 0.001
eb
0.4
Earthquake
Radio
Burglary
Alarm
Call
Weight
= r = a
0.6a
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 165
Samples:
B E A C R
Likelihood WeightingP(b) 0.03
P(e) 0.001
P(a)
b e b e b e b e0.98 0.40.7 0.01
P(c)
a a0.8 0.05
P(r)
e e0.3 0.001
e cb
Earthquake
Radio
Burglary
Alarm
Call
0.05Weight
= r = a
a 0.6
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 166
Samples:
B E A C R
Likelihood WeightingP(b) 0.03
P(e) 0.001
P(a)b e b e b e b e0.98 0.40.7 0.01
P(c)a a
0.8 0.05
P(r)e e
0.3 0.001
e cb r
0.3
Earthquake
Radio
Burglary
Alarm
Call
Weight
= r = a
a 0.6 *0.3
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 167
Likelihood Weighting• Let X1, …, Xn be order of variables
consistent with arc direction• w = 1• for i = 1, …, n do
–if Xi = xi has been observed•w w* P(Xi = xi | pai )
–else•sample xi from P(Xi | pai )
• return x1, …,xn, and w
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 168
Importance Sampling• A method for evaluating expectation of f
under P(x), <f>P(X)• Discrete:• Continuous:
• If we could sample from P
dxxPxff
xPxff
XP
xXP
)()(
)()(
)(
)(
r
XP rxfR
f ])[(1)(
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 169
Importance SamplingA general method for evaluating <f>P(X) when we cannot sample from P(X).Idea: Choose an approximating distribution
Q(X) and sample from it
Using this we can now sample from Q and then
x XQx
XP XQXPxfdx
XQXQxPxfdxxPxfxf
)()( )(
)()()()()()()()()(
W(X)
M
m
M
mXP mwmxf
MmXf
Mxf
1 1)(
)(])[(1])[(1)(
If we could generate samples from P(X)
Now that we generate the samples from Q(X)
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 170
(Unnormalized) Importance Sampling
1. For m=1:MSample X[m] from Q(X)Calculate W(m) = P(X)/Q(X)
2. Estimate the expectation of f(X) using
Requirements: P(X)>0 Q(X)>0 (don’t ignore possible scenarios) Possible to calculate P(X),Q(X) for a specific X=x It is possible to sample from Q(X)
M
mXP mwmxf
Mxf
1)(
)(])[(1)(
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 171
Normalized Importance SamplingAssume that we cannot evaluate P(X=x) but can evaluate P’(X=x) = P(X=x)(ex., we can evaluate P(X) but not P(X|e) in a Bayesian network)We define w’(X) = P’(X)/Q(X). We can then evaluate :
and then:
In the last step we simply replace with the above equation
xx
XQαxP
XQXPXQXw )('
)()(')()('
)(
)(
)()(
)(
)(')(')(
)(')(1)()()(')(1
)()()()()()()(
XQ
XQXQ
x
xxXP
XwXwXf
XwXfα
dxXQXQxPxf
α
dxXQXQxPxfdxxPxfxf
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 172
Normalized Importance SamplingWe can now estimate the expectation of f(X) similarly to unnormalized importance sampling by sampling x[m] from Q(X) and then
(hence the name “normalized”)
M
m
M
mXP
mw
mwmxfxf
1
1)(
)('
)('])[()(
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 173
Importance Sampling Weaknesses• Important to choose sampling
distribution with heavy tails– Not to “miss” large values of f
• Many-dimensional I-S:– “Typical set” of P may take a long time to
find, unless Q good approximation to P– Weights vary by factors exponential in N
• Similar for Likelihood Weighting