approximate inference: decomposition methods with applications to computer vision kyomin jung (...
TRANSCRIPT
Approximate Inference: Decomposition Methods with
Applications to Computer Vision
Kyomin Jung (KAIST)
Joint work with Pushmeet Kohli (Microsoft Research) Devavrat Shah (MIT)
서울대학교July 30th 2009
Graphical Model
A probabilistic model for which a graph denotes the conditional independence structure between random variables. Bayesian network (directed graph)
Markov Random Field (undirected graph)
Recently successful in machine learning
iX
Graphical Model
A probabilistic model for which a graph denotes the conditional independence structure between random variables. Bayesian network (directed graph)
Graphical Model
A probabilistic model for which a graph denotes the conditional independence structure between random variables. Markov Random Field (undirected graph)
Markov Random Field (MRF)
Developed from Ising model in statistical physics.
Applications computer vision, error correct coding, speech recognition, gene
finding etc.
Many heuristics for inference problems in MRF are devised. Theoretical guarantee for the correctness of those algorithms are
not known much.
Our goal : designing simple algorithms for inference with provable error bound by utilizing structures of the MRF.
Outline
Problem Statement and an Example Relevant Work Our Algorithms for Approximate Inference
Efficient algorithms based on local updates When the MRF is defined on a graph with polynomial growth,
our algorithm achieves approximation within arbitrary accuracy
Applications Image denoising Image segmentation
Conclusion
iX
Markov Random Field (MRF)
A collection of random variables , defined on a graph G.
The probability distribution of at vertex is dependent only on its neighbors :
}]{,|Pr[
],|Pr[
ijxXxX
ijxXxX
jjii
jjii
Xs
ViiXX )(
Graph G
iX
iX
iX
i
Markov Random Field (MRF)
A collection of random variables , defined on a graph G.
The probability distribution of at vertex is dependent only on its neighbors :
}]{,|Pr[
],|Pr[
ijxXxX
ijxXxX
jjii
jjii
Xs
ViiXX )(
Graph G
2
21
3 0
iX
6
iX
iX
i
Markov Random Field (MRF)
A collection of random variables , defined on a graph G.
The probability distribution of at vertex is dependent only on its neighbors :
}]{,|Pr[
],|Pr[
ijxXxX
ijxXxX
jjii
jjii
Xs
ViiXX )(
Graph G
3
21
0 0
iX
4
iX
iX
i
pair-wise MRF if
Eij
jiijVi
ii xxxZ
xXxP)(
),()(1
]Pr[:)(
for some and Ri : .: 2 Rij
Pair-wise MRF
Z is called the partition function.
2X
121X1
Computing Maximum A Posteriori
MAP(Maximum A Posteriori) assignment
Most likely assignment (mode of the distribution)
NP-hard even for simple graphs like grid.
Our goal For a given , compute approximation of
MAP :
such that
nx *
).()1()ˆ( *xPxP
0 nx ˆ
Example : Image denoising
We want to restore a binary (-1/+1) image Y of size with noise added.
Consider Y as an element of
Use an MRF model to restore the original image.
The underlying graph is a grid graph of size
.100100
100100
Y.}1,1{ 10000
Example : Image denoising
Y Utilizes two properties of
the original image Similar to Y It is smooth, i.e. number of edges
with different color is small
Define the following MRF, where
MAP assignment : original image
.}1,1{ 10000X
).exp()(),(
Vs Ets
tsss XXYXXP
*X
*X
Computing partition function Z
Equivalent to computing marginal probability
approximation of log Z is useful for many
applications including statistical physics,
computer vision.
Our goal: compute such thatZZZ L logloglog)1( ZZZ U log)1(loglog
0
1
}1,0{,0 )(
}1,0{,1 )(
1||1||1
1||1||1
),()(
),()(
Z
Z
xxx
xxx
VV
VV
xx Eijjiij
Viii
xx Eijjiij
Viii
]0Pr[
]1Pr[
1
1
X
X
UL ZZ ,
Relevant Work
Belief Propagation (BP)
BP and its variants like Tree-Reweighted algorithm have been very successful when G does not have many small cycles.
Ex) good when G is locally tree-like, and the MRF has correlation decay [Jordan, Tatikonda ‘99].
When G has lots of small cycles, their correctness are not known.
Pearl [‘88], Weiss [‘00], Yididia and Freeman [‘02], Wainwright, Jaakkola and Willsky [‘03]
Relevant Work Markov Chain Monte Carlo
Computing approximation of log Z key is to prove rapid mixing property which is non-
trivial. Jerrum and Sinclair [‘89], Dyer, Frieze and Kannan
[‘91]
Recent development Weitz [‘06] using self-avoiding walk tree approach Deterministic computation for Z for graphs with
degree <6 Cannot be applied to graphs with higher degree.
Our approach
Computing approximation of MAP and log-partition function for general graphs are NP-hard.
Many real applications of MRF model are defined on polynomially growing graphs.
We utilize structural properties of the polynomially growing graphs to obtain approximation algorithms.
Polynomially growing graph
1|)0,(| vB
v
:),( rvB ball of radius r around v w.r.t. the shortest path distance of G.
G
Polynomially growing graph
4|)1,(| vB
v
G
Polynomially growing graph
13|)2,(| vB
)(|),(| 2rOrvB
v
G
rCrvB |),(|
(A sequence of) graph is polynomially growing if there is constants s.t. for all
0, C,, ZrVv
Outline of our algorithm : MAP
Begin with a random assignment .
Choose an arbitrary order of vertices With the given vertex as a center, choose a ball
of radius r, where r is chosen from a geometric distribution.
Compute a MAP inside the ball while fixing the assignment outside the ball.
Update by the computed MAP inside the ball.
Output
We show is an approximation of MAP.
nx ˆ
x̂
x̂
x̂
x̂
Our MAP Algorithm
1v
1v21 r
Our MAP Algorithm
11 )1(]Pr[ iir for ,3,2,1 i
1v
Our MAP Algorithm
2v
12 r
2v
Our MAP Algorithm
12 )1(]Pr[ iir for ,3,2,1 i
Property of the geometric distribution
1)1(]Pr[ iir
For any
Hence, for any edge e,
Pr[ e is on the boundary of B(v,r)] Pr[e is inside the ball B(v,r)]
.1]Pr[
]Pr[
qr
qr,Nq
.1
v
e
Proof for MAP Algorithm
Consider an imaginary boundary of the algorithm as follows
Proof for MAP Algorithm
Consider an imaginary boundary of the algorithm as follows
Proof for MAP Algorithm
For any edge e of the graph G
Pr[ e belongs to the boundary of the algorithm]
Polynomial growth Size of each ball is small computation is efficient
Proof of approximation
If we restrict to a region R, it is a MAP assignment in R with some fixed assignment outside R.
Also, restricted to the region R is a MAP assignment in R with another fixed assignment outside R.
*x
x̂
region
region
Proof of approximation
We show the following Lemma : if the total differences of the potential functions
for two MRFs and on R is small, the difference between the probabilities
induced by the MAP assignments for and on R is small.
1X
2X1X
2X
region
region
Proof of approximation
By this lemma and the fact that for any edge e of G,
Pr[ e belongs to the boundary of the algorithm]
we obtain that the sum of the differences of the probabilities for all regions induced by and is small.
,
*xx̂
region
region
Theorem [Jung, Shah]
For the computation of MAP, our algorithm achieves approximate solution in expectation and it runs in time .nO
Outline of Our Algorithm : log-partition function
Obtain a random graph decomposition by removing some edges.
Compute the log-partition function inside each connected component, while replacing the potential functions of the removed boundary edges of the component by a constant.
Summand the computed values and output it.
we show that the output is an approximation of the log-partition function.
Graph decomposition
1v
1v21 r
11 )1(]Pr[ iir for ,3,2,1 i
Graph decomposition
1v21 r
11 )1(]Pr[ iir for ,3,2,1 i
Graph decomposition
12 r
2v
12 )1(]Pr[ iir for ,3,2,1 i
Graph decomposition
12 r
2v
12 )1(]Pr[ iir for ,3,2,1 i
Graph decomposition
|| )(
),()(Vx Eij
jiijVi
ii xxxZ
Proof of approximation bounds
Bij
Uij
R x Eijjiij
Viii
RVRR
xxx)()(||
),()(
|| )()(
),()(Vx Bij
Uij
BEijjiij
Viii xxx
Ex, for the upper bound,
where R is regions and B is the set of boundary edges.
Theorem [Jung, Shah]
For the computation of log Z, our algorithm outputs approximate upper bound and lower bound of log Z in expectation, and it runs in time
.nO
Application to Image Processing
In computer vision, the underlying graph is a grid
Relevant Problems Image Denoising
Image segmentation/ reconstruction Detect a specific object in an image Ex) face recognition, medical image process
We require the ratio of specific part of an object is close to a fixed ratio
Ex) Face segmentation Fix ratios of eye, nose, mouth, etc.
For the computation of MAP with fixed ratio, we provide an algorithm that outputs approximate solution in time , where k is the number of objects*.
MRF with fixed ratio
* Joint work with Kohli
knO
Future work
Adaptation of existing algorithms to computations in each component
Learning underlying Markov Random Field
Understanding limitations of inference algorithms
iX
Thank you