institut national de recherche en informatique et …

ISS

N 0

249-

6399

INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE

appor t de r echerche

1993

PROGRAMME 4

Robotique,image

et vision

A Hierarchical Markov Random Field Modeland Multi-Temperature Annealing for

Parallel Image Classification

Zoltan KATO

Marc BERTHOD

Josiane ZERUBIA

N� 1938—version 2Aout 1993, version revisee Octobre 1994

Champs de Markov hierarchiques et Recuit Multi-Temp�erature.

Application �a la classi�cation d'image par algorithmes parall�eles.

Zoltan KATO, Marc BERTHOD and Josiane ZERUBIA

INRIA - 2004 Route des Lucioles - BP 9306902 Sophia Antipolis Cedex - FRANCE

Tel (33) 93 65 78 57 - Fax (33) 93 65 76 43email: [email protected], [email protected],

[email protected]

R�esum�e

Dans ce rapport, nous nous int�eressons �a la classi�cation d'image par algorithmes de relaxation multi-�echelle mis en �uvre de fa�con massivement parall�ele. Les techniques multi-grille sont bien connues pouram�eliorer nettement les taux de convergence ainsi que la qualit�e des r�esultats des techniques it�erativesde relaxation. Tout d'abord, nous pr�esentons un mod�ele multi-�echelle classique qui consiste �a travaillersur une pyramide des �etiquettes mais �a conserver tout le champ d'observation. Le calcul des fonctionsde potentiel aux grilles grossi�eres est obtenu tr�es simplement. L'optimisation est d'abord r�ealis�ee �a une�echelle grossi�ere grace �a une algorithme parall�ele de relaxation, puis le niveau plus �n suivant est initialis�epar la projection du r�esultat obtenu �a l'�echelle plus grossi�ere. Dans un deuxi�eme temps, nous proposons unmod�ele Markovien hi�erarchique construit �a partir du mod�ele pr�ec�edent. Nous introduisons des nouvellesinteractions entre les niveaux voisins de la pyramide. Ceci permet de travailler avec des cliques dont lessites sont assez eloign�es �a un cout raisonable. Ce mod�ele conduit �a un algorithme de relaxation utilisantun nouveau type de recuit: le Recuit Multi-Temp�erature. Il s'agit d'associer de hautes temp�eraturesaux niveaux les plus grossiers, �etant ainsi moins sensibles aux minima locaux. Nous avons prouv�e laconvergence de cet algorithme vers un optimum global en g�en�eralisant le th�eor�eme de Geman et Geman.

Mots Clefs

champs de Markov, multi-�echelle, mod�ele hierarchique, algorithmes de relaxation, classi�cationd'image supervis�ee.

Remerciements

Ce travail a re�cu un soutien �nancier partiel du CNES, de l'AFIRST et du GdR TdSI (DRED/CNRS).

A Hierarchical Markov Random Field Model and

Multi-Temperature Annealing for Parallel Image Classi�cation

Zoltan KATO, Marc BERTHOD and Josiane ZERUBIA

INRIA - 2004 Route des Lucioles - BP 9306902 Sophia Antipolis Cedex - FRANCE

Tel (33) 93 65 78 57 - Fax (33) 93 65 76 43email: [email protected], [email protected],

[email protected]

Abstract

In this report, we are interested in massively parallel multiscale relaxation algorithms applied to imageclassi�cation. It is well known that multigrid methods can improve signi�cantly the convergence rate andthe quality of the �nal results of iterative relaxation techniques. First, we present a classical multiscalemodel which consists of a label pyramid and a whole observation �eld. The potential functions of coarsergrids are derived by simple computations. The optimization problem is �rst solved at the higher scale by aparallel relaxation algorithm, then the next lower scale is initialized by a projection of the result. Second,we propose a hierarchical Markov Random Field model based on this classical model. We introducenew interactions between neighbor levels in the pyramid. It can also be seen as a way to incorporatecliques with far apart sites for a reasonable price. This model results in a relaxation algorithm with anew annealing scheme: The Multi-Temperature Annealing (MTA) scheme, which consists of associatinghigher temperatures to higher levels, in order to be less sensitive to local minima at coarser grids. Theconvergence to the global optimum is proved by a generalisation of the annealing theorem of Geman andGeman.

Key Words

Markov Random Fields, multiscale, hierarchical model, relaxation algorithms, supervised imageclassi�cation.

Acknowledgments

This work has been partially funded by CNES (French Aerospace Agency), AFIRST (French-Israeli Collaboration) and GdR TdSI (DRED/CNRS).

iv Contents

Contents

1 Introduction 1

2 Markov Random Fields 2

2.1 Neighborhood Systems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2

2.2 Gibbs Distribution and MRF's : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3

2.3 A General Markov Image Model : : : : : : : : : : : : : : : : : : : : : : : : : : : 4

2.4 Relaxation Algorithms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5

3 The Classical Multiscale Model 6

3.1 General Description : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7

3.2 Relaxation Scheme : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9

3.3 A Special Case : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10

4 The Hierarchical Model 12

4.1 General Description : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12

4.2 A Special Case : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14

5 Multi-Temperature Annealing 14

5.1 Parallel Relaxation Scheme : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14

5.2 Complexity : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 16

5.3 Convergence Study : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 17

5.3.1 Homogeneous Annealing : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18

5.3.2 Inhomogeneous Annealing : : : : : : : : : : : : : : : : : : : : : : : : : : : 19

5.3.3 Multi-Temperature Annealing : : : : : : : : : : : : : : : : : : : : : : : : 21

6 Application to Supervised Image Classi�cation 22

7 Experimental Results 23

7.1 Comparison of the Schedules : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 24

7.2 Comparison of the Models : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 24

8 Conclusion 26

A Proof of The Multi-Temperature-Annealing Theorem 28

A.1 Notation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28

A.2 Proof of the Theorem : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 29

Zoltan KATO, Marc BERTHOD, Josiane ZERUBIA

Contents v

B Images 35

C Tables 46

INRIA Research Report No1938, August 1993

vi List of Figures

List of Figures

1 First order neighborhood-system with cliques : : : : : : : : : : : : : : : : : : : : 2

2 Updating sets in the case of a �rst order MRF : : : : : : : : : : : : : : : : : : : 5

3 Parallelized Markov chains (T0 > T1 > T2 > T3 > T4) : : : : : : : : : : : : : : : : 5

4 The isomorphism �i between Bi and Si. : : : : : : : : : : : : : : : : : : : : : : : 7

5 The multiscale relaxation scheme : : : : : : : : : : : : : : : : : : : : : : : : : : : 9

6 The two subsets of C in the case of a �rst order neighborhood system. a: Cik ; b:Cik;l. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10

7 The functions and �1 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12

8 The neighborhood system �G and the cliques �C1, �C2 and �C3. : : : : : : : : : : : : : 12

9 Relaxation scheme on the pyramid. The levels connected by pointers are updatedat the same time. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15

10 Memory complexity of the hierarchical model : : : : : : : : : : : : : : : : : : : : 16

11 Communication scheme of the hierarchical model : : : : : : : : : : : : : : : : : : 16

12 Energy decrease with the MTA schedule : : : : : : : : : : : : : : : : : : : : : : : 24

13 Energy decrease with the inhomogeneous annealing schedule : : : : : : : : : : : : 24

14 Results of the Gibbs sampler on a synthetic image with inhomogeneous and MTAschedules : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 25

15 Histogram of the SPOT image with 6 classes : : : : : : : : : : : : : : : : : : : : 25

16 Results on a synthetic image with 2 classes. : : : : : : : : : : : : : : : : : : : : : 36

17 Results on a synthetic image with 4 classes. : : : : : : : : : : : : : : : : : : : : : 37

18 Results on a synthetic image with 16 classes. : : : : : : : : : : : : : : : : : : : : 38

19 Results on a SPOT image with 4 classes. : : : : : : : : : : : : : : : : : : : : : : : 39

20 Results on an indoor scene with 4 classes. : : : : : : : : : : : : : : : : : : : : : : 40

21 Results on a medical image with 4 classes. : : : : : : : : : : : : : : : : : : : : : : 41

22 Original SPOT image with 6 classes. : : : : : : : : : : : : : : : : : : : : : : : : : 42

23 Ground truth data. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 43

24 Results of the ICM algorithm. Comparison with ground truth data. : : : : : : : 44

25 Results of the Gibbs Sampler. Comparison with ground truth data. : : : : : : : : 45


List of Tables vii

List of Tables

1 Results on a noisy synthetic image with 2 classes : : : : : : : : : : : : : : : : : : 47

2 Results on a noisy synthetic image with 4 classes : : : : : : : : : : : : : : : : : : 47

3 Results on a noisy synthetic image with 16 classes : : : : : : : : : : : : : : : : : 47

4 Results on a SPOT image with 4 classes : : : : : : : : : : : : : : : : : : : : : : : 48

5 Results on the SPOT image with 6 classes : : : : : : : : : : : : : : : : : : : : : : 48

6 Original energy function U . : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 54

7 Modi�ed energy function U 0. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 54


viii Notations

Notations

S = fs1; s2; : : : ; sNg : : : : : : : : : : : : : : : : : : : : : : : : : : : : Set of sites or pixels

G = fGs j s 2 Sg : : : : : : : : : : : : : : : : : : : : : : : : : Neighborhood system over SGs : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Neighborhood of s

C � S : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : A clique

C : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Set of cliques

Cs : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Set of cliques containing s

deg(C) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Degree of the cliques

L : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Lattice de�ned on S (s = (i; j))

Gn : : : : : : : : : : : : : : : : : : : : : : : Homogeneous neighborhood system of order n

X = fXs : s 2 Sg : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : MRF over S� = f0; 1; : : : ; L� 1g : : : : : : : : : : : : : : : : : : : : : : : Common state space for Xs

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Set of all possible con�gurations

! = (!s1 ; : : : ; !sN ) : !si 2 �; 1 � i � N : : : : : : : : : : : : : : : : Any con�guration of X�(!) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Gibbs distribution on

Z : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Partition function

T : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Temperature

U(!) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Energy function

VC(!) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Potential of clique C

F = ffs : s 2 Sg : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Observed data

fnk; k = 1; 2; : : :g : : : : : : : : : : : : : : : : : : : : : : : : : : Updating order of the sites

W = wn : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Width of the lattice LH = hm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Height of the lattice LM = inf(n;m) : : : : : : : : : : The highest level of the pyramid (we have M + 1 levels)

Bi = fbi1; : : : ; biNig : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ith scale

bik : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : kth block at the ith scale

Ni = N=(wh)i : : : : : : : : : : : : : : : : : : : : : : : : Number of blocks at the ith scale

!ik 2 � : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Common label of the block bik

i : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Con�guration space at the ith scale

Cij : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : A clique of order j at scale i

Ci : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Set of cliques at scale i

DCij

: : : : : : : : : : : : : : : : : : : : : Set of cliques included in the clique Cij at scale i


Notations ix

Aij : : : : : : : : : : : : : : : : : Set of cliques included in any clique of order j at scale i

V Bi

Cij

: : : : : : : : : : : : : : : : : : : : : : : : : : : : Potential of the clique Cij at scale i

Si : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Grid at the ith level of the pyramid

�i = f�is : s 2 Si; �is 2 �g : : : : : : : : : Con�guration space of the ith level of the pyramid

�i : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Isomorphism between Si and BiU i(�i) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Energy function at level i

V iCi(�i) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Clique-potentials at level i

�S = f�s1; : : : ; �s �Ng : : : : : : : : : : : : : : : : : : : : : : : : : : Set of sites of the pyramid

�N : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Number of sites of the pyramid

� : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Con�guration space of the pyramid

�! : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : A con�guration of the pyramid

: : : : : : : : : : : : : : : : : : : : : : : Projection-function between two neighbor levels

�G : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Neighborhood system on the pyramid

Gi : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Neighborhood system at level i

�C : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Set of cliques over the pyramid

C� : : : : : : : : : : : : : : : : : : : Set of the cliques located between two neighbor levels

�X : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : MRF over the pyramid

�U(�!) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Energy function on the pyramid

�V �C(!) : : : : : : : : : : : : : : : : : : : : : : : : : : : : Clique-potentials on the pyramid

U�(�!) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Energy over the cliques C�P!;�(k � 1; k) : : : : : : : : : : : : : : : : : : : : : Probability of the kth transition ! ! �

X(k) (k = 1; 2; : : :) : : : : Markov chain generated by the Simulated Annealing algorithm

P!;�(T ) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Transition matrix

G!;�(T ) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Generation matrix

A!;�(T ) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Acceptance matrix

opt : : : : : : : : : : : : : : : : : : : : : : : : : : : Set of globally optimal con�gurations

T (k; C) : : : : : Temperature function depending on the iteration k and on the cliques C

�T (k;C)(!) : : : : : : : : : : : : : : : : : : : : Gibbs distribution with temperature T(k,C)

� : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : The operationP

C2CVC(!)T (k;C)

�0 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Uniform distribution on opt

U sup : : : : : : : : : : : : : : : : : : : : : : : Maximum value of the energy function U(!)

U inf : : : : : : : : : : : : : : : : : : : : : : : Minimum value of the energy function U(!)


x Notations

� : : : : : : : : : : : : : : : : : Di�erence between the maximum and minimum of U(!)

T infk : : : : : : : : : : : : : : : : : : : : : : : : : Minimum of T (k; C) at the kth iteration

�� : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Mean value of class � 2 �

�� : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Deviation of class � 2 �

� : : : : : : : : : : : : : : : : : : : : : : : Second order clique-potential at the �nest level

: : : : : : : : : : : : : : : : : : : : Second order clique-potential between neighbor levels

P (k; !jl; �) : : : : : : : : : The same as the transition probability P (X(k) = !jX(l) = �)

P (k; !jl; �) : : : : : : : : : : : : : : : : : : : The same asP

� P (X(k) = !jX(l) = �)�(�)

P (k; �jl; �) : : : : : : : : : : : : : : : : : : : : : : : : The \�" means any con�guration here

k�� k : : : : : : : : : : : : : : : : : : : : : : : : The L1 norm of two distributions on

T infk : : : : : : : : : : : : : : : : : : : : : : : : : Minimum of T (k; C) at the kth iteration


1 Introduction 1

1 Introduction

Markov Random Fields (MRF) have become more and more popular during the last fewyears in image processing [1, 7, 10, 12, 14, 30]. A good reason for that is that such a

modelization is the one which requires the less a priori information on the world model. Onthe other hand, the local behavior of MRF permits to develop highly parallel algorithms in theresolution of the combinatorial optimization problem associated with such a model.

In this report, we are interested in massively parallel multiscale relaxation algorithms appliedto image classi�cation [8, 9, 16, 26, 28]. It is well known that multigrid methods can improvesigni�cantly the convergence rate and the quality of the �nal results of iterative relaxationtechniques.

There are many approaches in multigrid image segmentation. F. Marques et al. [26] proposea hierarchical Compound Gauss-Markov Random Field model with a label pyramid and anobservation pyramid. Bouman [8, 9] proposes a multiscale MRF model, where each scale iscausally dependent on the coarser grid �eld above it. This model yields to a non-iterativesegmentation algorithm and direct methods of parameter estimation. The basis of our approachis a consistent multiscale MRF model proposed by F. Heitz et al. in [16, 28] for motion analysis.This model consists of a label pyramid and a whole observation �eld. The original energyfunction can be decomposed as a sum of potential functions which are de�ned on neighborblocks and only depend on the labels associated with these blocks and on the observation �eld.Using this decomposition, the parameters of coarser grids can be computed very easily. Thismodel results in a multigrid relaxation scheme which replaces the original optimization problemby a sequence of more tractable problems. Using a top down strategy in the label pyramid,the optimization problem is �rst solved at a higher level, then the lower grid is initializedwith the previous result by a simple projection. This algorithm is very e�cient in the caseof deterministic relaxation (for instance ICM [4, 18]) which gets stuck in a local minimumnear the starting con�guration. In the case of stochastic relaxation (for instance SimulatedAnnealing [13, 24, 27]), which are far less dependent on the initial con�guration, the results areonly slightly better, but the method is still interesting with respect to computer time, especiallyon a sequential machine. After a brief introduction to the theory of Markov Random Fields(Section 2), we give a general description of this model and the relaxation scheme associatedwith it in the Section 3.

Then, we propose a new hierarchical MRF model de�ned on the whole label pyramid (Sec-tion 4). In this model, we have introduced a new interaction scheme between neighboring levelsin the pyramid yielding a better communication between the grids. It can also be seen as a wayto incorporate cliques with far apart sites for a reasonable price. This model gives a relaxationalgorithm with a new annealing scheme which can be run in parallel on the entire pyramid.The basic idea of this annealing scheme, which we propose to call Multi-Temperature Annealing(MTA) is the following: to the higher levels, we associate higher temperatures which enable thealgorithm to be less sensitive to local minima. However at a �ner resolution, the relaxation isperformed at a lower temperature (at the bottom level, it is closed to 0). The complete con-vergence study of the relaxation algorithm in the case of a homogeneous, inhomogeneous andMulti-Temperature Annealing schedule can be found in Section 5. In the multi-temperature case,our annealing theorem is a generalisation of the well known theorem of Geman and Geman [13]and the proof can be found in Appendix A.


2 2 Markov Random Fields

In Section 6, we apply these models to supervised image classi�cation. Using a �rst orderMRF model to take into account the context and a Gaussian representation of the classes, wede�ne the energy function for the monogrid, multiscale and hierarchical models.

Finally, experiments are shown in Section 7 with the Gibbs sampler [13] and the IteratedConditional Mode [4, 18] using the three models for each algorithm (monogrid, multiscale andhierarchical). These methods have been implemented in parallel on a Connection MachineCM200 [17].

2 Markov Random Fields

First, we brie y give an introduction to the theory of Markov Random Fields (MRF) [1, 29],then we describe a general image model used in the following sections. Finally, we recall a fewclassical relaxation algorithms used for the optimization of the cost function of the model.

2.1 Neighborhood Systems

Let S = fs1; s2; : : : ; sNg be a set of sites.

De�nition 2.1 (Neighborhood system) G = fGs j s 2 Sg is a neighborhood system forS if

1. s 62 Gs2. s 2 Gr , r 2 Gs

De�nition 2.2 (Clique) A subset C � S is a clique if every pair of distinct sites in C areneighbors. C denotes the set of cliques and deg(C) = maxC2C j C j.

The most commonly used neighborhood systems are the

ue

eee

�

�

�

�

uu uuu

Cliques

Figure 1: First order neighborhood-

system with cliques

homogeneous systems. In this case, we consider S as a latticeL and de�ne these neighborhoods as

Gn = fGn(i;j) : (i; j) 2 Lg;Gn(i;j) = f(k; l) 2 L : (k � i)2 + (l� j)2 � ng:

Obviously, sites near the boundary have fewer neighborsthan interior ones. Furthermore, G0 � S and for all n �0 : Gn � Gn+1. Figure 1 shows a �rst-order neighborhoodcorresponding to n = 1. The cliques are f(i; j)g; f(i; j); (i; j+1)g; f(i; j); (i+ 1; j)g.


2.2 Gibbs Distribution and MRF's 3

2.2 Gibbs Distribution and MRF's

Let X = fXs : s 2 Sg denotes any family of random variables so that 8s 2 S : Xs 2 �, where� = f0; 1; : : : ; L� 1g is a common state space. Let = f! = (!s1 ; : : : ; !sN ) : !si 2 �; 1 � i �Ng be the set of all possible con�gurations.

De�nition 2.3 (Markov Random Field) X is a Markov Random Field (MRF) with re-spect to G if

1. for all ! 2 : P (X = !) > 0,

2. for every s 2 S and ! 2 :P (Xs = !s j Xr = !r ; r 6= s) = P (Xs = !s j Xr = !r ; r 2 Gs).

The functions in 2. are called the local characteristics of the MRF, and the probability dis-tribution P (X = !) of any process satisfying 1. is uniquely determined by these conditionalprobabilities. However, it is extremely di�cult to determine these characteristics in practice.

De�nition 2.4 (Gibbs distribution) A Gibbs distribution relative to the neighborhoodsystem G is a probability measure � on with the following representation:

�(!) =1

Zexp

��U(!)T

�; (1)

where Z is the normalizing constant or partition function:

Z =X!

exp

��U(!)T

�;

T is a constant called the temperature and the energy function U is of the form

U(!) =XC2C

VC(!): (2)

Each VC is a function de�ned on depending only on those elements !s of ! for whichs 2 C. Such a function is called a potential.

One of the most important theorem is probably the Hammersley-Cli�ord theorem [1] whichpoints out the relation between MRF and Gibbs distribution:

Theorem 2.1 (Hammersley-Cli�ord) X is a MRF with respect to the neighborhood sys-tem G if and only if �(!) = P (X = !) is a Gibbs distribution with respect to G.

The main bene�t of this equivalence is that it provides us a simple way to specify MRF's, namelyspecifying potentials instead of local characteristics (see de�nition 2.3), which is usually verydi�cult.


4 2 Markov Random Fields

2.3 A General Markov Image Model

We now look at the image labeling model. Image labeling is a general framework to solve lowlevel vision tasks, such as image classi�cation, edge detection, etc: : :To each pixel of the image,we assign a label. The meaning of the labels depends on the task that we want to solve. Forimage classi�cation, for example, a label means a class; for edge detection, it means the presenceor the direction of an edge; etc: : :Thus, we have the following general problem:

We are given a set of pixels (an image) S = fs1; s2; : : : ; sNg with some neighborhood systemG = fGs : s 2 Sg and F = ffs : s 2 Sg a set of image data (or observations). Each of thesepixels may take a label from � = f0; 1; : : : ; L � 1g. The con�guration space is the set of allglobal discrete labeling ! = (!s1 ; : : : ; !sN ); !s 2 �. We assume that X is a MRF relative to Gwith a corresponding energy function U2 and potentials fVCg:

P (X = !) =1

Zexp

��U2(!)T

�U2(!) =

XC2C

VC(!)

Now, we will construct a Bayesian estimator to �nd the optimal labeling, that is the labelingwhich maximizes the posterior distribution P (X = ! j F) of the label �eld:

P (X = ! j F) = P (F j X = !)P (X = !)

P (F) (3)

Since P (F) is constant, the MAP estimator of the label �eld is given by:

max!2

P (X = ! j F) = max!2

P (F j X = !)P (X = !): (4)

If we assume that the observed image F is a�ected at site s only by the pixel s itself (i.e. theimage is not blurred), one can prove, that P (F j X = !) is a Gibbs distribution over G0 � Swith an energy function U1 and potentials Vfsg (a blurred image model is studied, for example,in [13]). Thus, the posterior distribution is also a MRF over G with the following energy function:

U(!) = U1(!) + U2(!) where (5)

U1(!) =Xs2S

Vfsg(!s) and (6)

U2(!) =XC2C

VC(!C) (7)

Using this function, the MAP estimator is given by:

! = argmax!2

P (X = ! j F) = argmax!2

1

Zexp

��U(!)T

�= argmin

!2U(!): (8)


2.4 Relaxation Algorithms 5

uuuu

uu

uu

eeee

ee

ee

T

T

T

T

T

0

1

2

3

4

...

...

...

...

... ...

...

...

...

... ...

...

...

...

...

Figure 2: Updating sets in the

case of a �rst order MRF

Figure 3: Parallelized Markov chains (T0 > T1 > T2 >

T3 > T4)

2.4 Relaxation Algorithms

It is usually an extremely hard computational problem to �nd even a near-optimal solu-tion of Equation (8). Standard optimization algorithms does not work in this case, since Uis generally a non-convex function. Several approaches have been proposed to solve this task,such as Simulated Annealing (SA) [13, 24, 27], Graduated Non-Convexity (GNC) [7], IteratedConditional Mode (ICM) [4, 18], Mean Field Annealing (MFA) [6, 12, 30], Game StrategyAnnealing (GSA) [25], Deterministic Pseudo Annealing (DPA) [22, 3], Modi�ed Metropolis Dy-namics (MMD) [22, 23]: : :Multigrid schemes have also been proved to be very e�cient for energyminimization [8, 9, 16, 28].

Herein, we review two well known methods. The �rst one is the Iterated Conditional Mode(ICM) proposed by Besag in [4]. This algorithm realizes a deterministic search for the MAPestimate given in Equation (8). An advantage of the method is that the convergence is muchfaster than that of the stochastic methods. However, the achieved optimum is only a local onelocated near the initial con�guration. Thus, a good initialization is vital for this method. Thesequential ICM algorithm is described as follows: Let fnk; k = 1; 2; : : :g be the sequence inwhich the sites are visited for updating, and set the temperature parameter T to 1 throughoutthe algorithm.

Algorithm 2.1 (ICM)

1. Choose a good initial con�guration !0, set k = 0 and let � > 0 be a threshold.

2. !k+1 = � if � = !k j!knk=� for any � 2 � and � = argmin�2� U(!kj!knk=�).

3. k = k + 1 and go to 2 until jU(!k)� U(!k�1)j < �.

The second algorithm is the Gibbs Sampler introduced by D.Geman and S.Geman in [13]. Thismethod converges to the global optimum (see proof in [13]) but the convergence is very slow.


6 3 The Classical Multiscale Model

Let again fnkg be the sequence of visiting the sites. The temperature parameter is lowered hereafter each complete sweep of S.

Algorithm 2.2 (Gibbs Sampler)

1. Set k = 0, assign an initial con�guration !0 and let � > 0 be a threshold.

2. !k+1 = � with probability �T (�) = exp(�U(�)=T ) if � = !k j!knk=� for any � 2 �.

3. k = k+1 and T is lowered if a full sweep is achieved. Go to 2 until jU(!k)�U(!k�1)j < �.

The sequence fnk ; k = 1; 2; : : :g is usually a simple raster scanning. A more sophisticatedscanning is, for example, the Highest Con�dence First (HCF) [11], where the sites with largelikelihood ratios of one label over the others are visited earlier.

Let us consider now the parallelization [2]. We use a coding technique, similar to the onereported in [4], which requires no modi�cation of the theoretical background of the methods.Considering a SIMD (Single Instruction Multiple Data) machine, we update at the same timethe sites which are conditionally independent. More precisely, we partition the set S into dis-joint regions (coding regions or updating sets) such that sites belonging to the same region areconditionally independent given the data of all the other regions. In this way, we can updatethe sites of a whole region at each iteration, instead of one site. The Figure 2 shows a coding inthe case of a �rst order MRF.

Another scheme has been proposed by Gra�gne in [15]. In this case, each processor generatesa Markov chain at di�erent temperature (see Figure 3). After a certain number of iterations,the processors at higher temperatures transfer their con�guration to the lower processors. Thiscon�guration is accepted only if it is better (i.e. of lower energy) than the current one. Thisalgorithm can be implemented both on a MIMD (Multiple Instruction Multiple Data) or a SIMDmachine, assuming in the latter case that we have the same relaxation algorithm for each chain.The only problem with this scheme is that it is not equivalent to its sequential version and theconvergence of such a method is not proved.

We use the idea of performing relaxation at di�erent temperatures in our Multi-TemperatureAnnealing schedule (see Section 5) but, in our case, the convergence has been proved (seeAppendix A).

3 The Classical Multiscale Model

This multiscale model has been proposed by F. Heitz et al. in [16] for motion analysis using asecond order neighborhood system (see De�nition 2.1). Herein, we give a more general descrip-tion of this model, then we present the energy-minimization scheme and �nally, we study it inthe case of a �rst order neighborhood system (see Figure 1) which is the most commonly usedin image classi�cation problems.


3.1 General Description 7

B

B

B

0

1

2

S

S

S

0

1

2Φ

Φ

Φ 0

1

2

Figure 4: The isomorphism �i between Bi and Si.

3.1 General Description

Let us suppose again that S = fs1; : : : ; sNg is a W �H lattice, so that:

S � L = f(i; j) : 1 � i � W and 1 � j � Hg; (9)

and1 W = wn, H = hm. Furthermore, we have some neighborhood system G on these sites.Let X be a MRF over G with an energy function U and potentials fVCgC2C. The followingprocedure will generate the multigrid MRF corresponding to X :

1. Let B0 � S and 0 � .

2. For all 1 � i �M (M = inf(n;m)), S is divided into blocks of size wi � hi. These blockswill form the scale Bi = fbi1; : : : ; biNi

g (Ni = N=(wh)i).

The labels assigned to the sites of a block are supposed to be the same over the whole block.The common label of the block bik is denoted by !ik 2 �. This constraint yields a con�gurationspace i which is a subset of the original set . Obviously, for all 0 � i � M : i � i�1 �� 0 � .

Now, let us consider the neighborhood system at scale i. It is clear, that bik and bil areneighbors if and only if there exist two neighbors s 2 S and r 2 S such that s 2 bik and r 2 bil.This yields the same cliques as in C. The cliques can be de�ned in the following way: Let

1This assumption introduces some restrictions on L but this is not crucial in practice since we work mostly onimages where both W and H is a power of 2.



d = deg(C). For all 1 � j � d, the set of j blocks Cij at scale i is a clique of order j if there

exists a clique C 2 C (that is a clique at the �nest scale) such that:

1: C �[

bik2Ci

j

bik

2: 8bik 2 Cij : C \ bik 6= ;:

The set of cliques at scale i is denoted by Ci (C0 � C). The set of all cliques satisfying 1 and 2for a given Ci

j is denoted by DCij� C.

Let us partition the original set C into the following disjoint subsets: For all 1 � j � d, letAij be the set of cliques C 2 C for which there exists a clique Ci

j (that is a clique of order j at

the scale i) satisfying 1 and 2. Then, it turns out from the de�nition of DCijand Ai

j , that

Aij =

[Cij2Ci

DCij: (10)

Using this partition, the energy function U can be decomposed in the following way:

U(!) =XC2C

VC(!) =XC2Ai

1

VC(!) + � � �+XC2Ai

d

VC(!) (11)

=XCi12Ci

XC2D

Ci1

VC(!) + � � �+XCid2Ci

XC2D

Cid

VC(!) (12)

The main bene�t of this decomposition is that the potentials at coarser scales can be derived bysimple computation from the potentials at the �nest scale. If we note the potential correspondingto a clique Ci

j of order j at the scale i by V Bi

Cij

, we have the following family of potentials at

scale Bi:V Bi

Cij(!) =

XC2D

Cij

VC(!) (13)

If we examine our model, we see that there is some redundancy at coarser scales: we havethe same label over the sites of a block. It seems then natural to associate a unique site toeach block. These sites have the common state of the corresponding block and they form acoarser grid Si isomorphic to the corresponding scale Bi. The coarser con�guration space�i = f�is : s 2 Si; �is 2 �g is isomorphic to i. Obviously, �0 � 0 � . The isomorphism �i

from Si in Bi is just a projection of the coarser label �eld to the �ne grid S0 � S:�i : �i �! i

�i 7�! ! = �i(�i): (14)

�i keeps the same neighborhood structure on Si as on Bi and the cliques on Si inherit thepotentials from the cliques de�ned on Bi. These grids form a pyramid where level i contains thegrid Si. The energy function of level i (i = 0; : : : ;M) is of the form:

U i(�i) =XCi2Ci

V iCi(�i) i = 0; : : : ;M (15)

where V iCi(�i) = V Bi

Ci (�i(�i)): (16)


3.2 Relaxation Scheme 9

Si−1

Si

projection

relaxation

projection

.

.

.

Figure 5: The multiscale relaxation scheme

3.2 Relaxation Scheme

Basically, this is the same procedure as in the monogrid case. The only di�erence is that wehave more functions to minimize which are less complex than the original one. The algorithmis the following (see Figure 5): instead of minimizing the original energy function U , we tacklethe sequence of problems U i (M � i � 0) using a top-down startegy in the pyramid. First, wesolve the problem at a higher level i using a parallel relaxation scheme, then the level i � 1 isinitialized by (�i�1)�1 � �i(�i) which is just a projection of �i on the �ner grid Si�1 (�i is thesolution at level i).

The advantages of this algorithm are clear: each �i gives a more or less good estimate of the�nal result. The estimate is better as i goes down to 0. On the other hand, for the higher valuesof i, the corresponding problem is simpler since the state space has only a few elements.

This scheme is particularly well adapted to the deterministic relaxation methods which aremore sensitive to the initial con�guration than the stochastic ones. In our experiments (seesection 7), the �nal result is improved compared to the monogrid version of the same algorithm.However, for the stochastic ones, the �nal result is only slightly improved since these methodsare independent of the initial con�guration.

Another important measure of the e�ciency is the speed of convergence. On a sequential



a . b.

Figure 6: The two subsets of C in the case of a �rst order neighborhood system. a: Cik; b: Cik;l.

machine, the proposed scheme exhibits fast convergence properties. However, on a SIMD ma-chine, the speed depends mainly on the Virtual Processor Ratio (VPR = number of the virtualprocessors per physical processor). This means, that the monogrid scheme may be faster onsuch a machine, considering the (very simple) parallelization described above. Faster, becausethe multiscale scheme demands usually more iterations (the relaxation algorithm must convergeat each level and there is a minimal number of iterations necessary for the convergence). In ourexperiments, the monogrid scheme was allways faster then this one on a Connection MachineCM200 (see Section 7).

We note that in [16] another parallelization scheme has been proposed which consists of gen-erating con�gurations in parallel, using di�erent temperatures at di�erent levels, with periodicinteractions between them. The interaction introduces a transfer, at every n iterations, of asmall block of labels to the next �ner level. The block is accepted, if the energy of the newblock is lower (deterministic rule). We also implemented a �ner version of this scheme. In ourapproach, each site at each iteration transfers its state to the next lower level. At the lowerscale, this information is taken into account as the state of an additional neighbor site. Thetransition is then governed by the Gibbs Sampler or any other method, taking into account thisexternal information (probabilistic rule).

The problem with both algorithms is that, to our knowledge, the convergence of such analgorithm has not been proved. Looking for a better parallelization scheme with a theoreticalbackground may be a future work.

3.3 A Special Case

In the following, we will focus on a MRF with a �rst order neighborhood-system (see Figure 1)where the energy function is given by:

U(!;F) = U1(!;F) + U2(!) (17)

where U1 (resp. U2) denotes the energy of the �rst order (resp. second order) cliques. Thenotation U1(!;F) means that the �rst order potentials depend not only on the actual labelingbut also on the given observations.

We follow the procedure described in the Section 3.1 to generate a multigrid MRF model.Let Bi = fbi1; : : : ; biNi

g denote the set of blocks and i the con�guration-space at scale i (i �


3.3 A Special Case 11

i�1 � � � � � 0 = ). The label associated with block bik is denoted by !ik . We can de�ne thesame neighborhood structure on Bi as on S:

bik and bil are neighbors ()(bik � bil or9C 2 C j C \ bik 6= ; and C \ bil 6= ; (18)

Now, let us partition the original set C into two disjoint subsets fCikg and fCik;lg:

1. cliques which are included in bik (see Figure 6/a.):

Cik = fC 2 C j C � bikg (19)

2. cliques which sit astride two neighboring blocks fbik; bilg (see Figure 6/b.):Cik;l = fC 2 C j C � (bik [ bil) and C \ bik 6= ; and C \ bil 6= ;g (20)

It is obvious from this partition that our energy function (see Equation (17)) can be decomposedas:

U1(!;F) =Xs2S

V1(!s; fs)

=Xbik2Bi

Xs2bi

k

V1(!s; fs)

| {z }V B

i

1(!i

k;F)

=Xbik2Bi

V Bi1 (!ik;F) (21)

and U2(!) =XC2C

V2(!c)

=Xbik2Bi

XC2Ci

k

V2(!c)

| {z }V B

i

k(!i

k)

+X

fbk;blgneighbors

XC2Ci

k;l

V2(!c)

| {z }V B

i

k;l(!i

k;!i

l)

=Xbik2Bi

V Bi

k (!ik) +X

fbk;blgneighbors

V Bi

k;l (!ik; !

il) (22)

Now, we can de�ne our pyramid (see Figure 4) where level i contains the coarse grid Si whichis isomorphic to the scale Bi. The coarse grid has a reduced con�guration space �i = �Ni.

The model on the grids Si (i = 0; : : : ;M) de�nes a set of consistent multiscale MRF models,whose energy functions are derived from Equations (21) and (22)

U i(�i;F) = U i1(�

i;F) + U i2(�

i) (23)

= U1(�i(�i);F) + U2(�

i(�i)) i = 0; : : : ;M

with U i1(�

i;F) =Xk2Si

(V Bi

1 (!ik;F) + V Bi

k (!ik)) =Xk2Si

V i1 (�

ik;F) (24)

and U i2(�

i) =X

fk;lgneighbors

V Bik;l (!

ik; !

il) =

XCi2Ci

V i2 (�

iC) (25)

where Ci is a second order clique corresponding to the de�nition in Equation (18) and Ci is theset of cliques on the grid Si.


12 4 The Hierarchical Model

S

S

Si+1

i

i−1

Ψ−1:

Ψ :

S

S

S

i−1

i

i+1

C1

C3

C2

Figure 7: The functions and �1 Figure 8: The neighborhood system �G and the cliques �C1, �C2and �C3.

4 The Hierarchical Model

In this section, we propose a new hierarchical MRF model. The basic idea is to �nd a better wayof communication between the levels than the initialization used for the multiscale model. Ourapproach consists in introducing new interactions between two neighbor grids2 in the pyramid.This scheme permits also the parallelization of the relaxation algorithm on the whole pyramid.First, we give a general description of the model, then we study a special case with a �rst orderneighborhood system.

4.1 General Description

We consider here the label pyramid and the whole observation �eld de�ned in the previoussection. Let �S = f�s1; : : : ; �s �Ng denote the sites of this pyramid. Obviously,

�S =M[i=0

Si (26)

�N =MXi=0

Ni:

2One can imagine interactions between more than two levels but these schemes are too complicated for practicaluse.


4.1 General Description 13

� denotes the con�guration-space of the pyramid:

� = �0 � �1 � � � � � �M

= f�! j �! = (�0; �1; : : : ; �M)g (27)

Let us de�ne the following function between two neighbor levels, which assigns to a site ofany level the corresponding block of sites at the level below it (that is its descendants). �1

assigns its ancestor to a site (see Figure 7):

: Si �! Si�1(�s) = f�r j �s 2 Si ) �r 2 Si�1 and bi�1�r � bi�sg (28)

Now we can de�ne on these sites the following neighborhood-system (see Figure 8):

�G = (M[i=0

Gi) [ f�1(�s)[ (�s) j �s 2 �Sg (29)

where Gi is the neighborhood structure of the ith level, and we have the following cliques:

�C = (M[i=0

Ci) [ C� (30)

where C� denotes the new cliques siting astride two neighbor grids. We can easily estimate thedegre of the new cliques since it depends on the block size: Each site interacts with its ancestor(there is one) and its descendants (there are wh), thus:

deg(C�) = maxC�2C�

j C� j= wh+ 2 (31)

and deg( �C) = deg(C) + deg(C�)� 1: (32)

Furthermore, let �X be a MRF over �G with energy function �U and potentials f �V �Cg �C2 �C. Theenergy function is of the following form:

�U(�!) =X�C2 �C

�V �C(�!)

=MXi=0

X�C2Ci

V i�C(�!) +

X�C2C�

�V �C(�!)

=MXi=0

XCi2Ci

V iCi(�i) +

XC�2C�

�VC�(�!)

=MXi=0

U i(�i) + U�(�!) (33)

It turns out from the above equation, that the energy function consists of two terms. The �rstone corresponds to the sum of the energy functions of the grids de�ned in the previous sectionand the second one (U�(�!)) is the energy over the new cliques located between neighbor grids.


14 5 Multi-Temperature Annealing

4.2 A Special Case

In this section, we study the model in the case of a �rst order neighborhood system. We willconsider herein only �rst and second order cliques. Clique-potentials for the other cliques aresupposed to be 0. The cliques can be partitioned into three disjoint subsets �C1; �C2; �C3 corre-sponding to �rst order cliques, second order cliques which are on the same level and secondorder cliques which sit astride two neighboring levels (see Figure 8). Using this partition, wecan derive the following energy function:

�U(�!;F) = �U1(�!;F) + �U2(�!) (34)�U1(�!;F) =

X�s2 �S

�V1(�!�s;F)

=MXi=0

Xsi2Si

V i1 (�

isi ;F) =

MXi=0

U i1(�

i;F) (35)

�U2(�!) =XC2 �C2

�V2(�!C) +XC2 �C3

�V2(�!C)

=MXi=0

XC2Ci

V i2(�

iC) +

XC2 �C3

�V2(�!C)

=MXi=0

U i2(�

i) +XC2 �C3

�V2(�!C) (36)

In the next section, we propose a new annealing scheme for the e�cient minimization of theenergy function of the hierarchical model.

5 Multi-Temperature Annealing

5.1 Parallel Relaxation Scheme

Now, let us see how the energy �U(�!) is minimized. If we use a deterministic relaxation methodwhere the temperature parameter is kept constant during the iterations (for example ICM [4]),then the original formulation of the algorithm does not change. The only di�erence is that wework on a pyramid and not on a rectangular shape as in the monogrid case. We can easilyparallelize this algorithm using the coding technique described by Besag in [4]: we partition thepyramid �S into disjoint updating sets so that pixels which belong to the same set are conditionallyindependent, given the data of all the other sets. This enables us to update di�erent levels atthe same time (see Figure 9).

Let us consider in the followings a relaxation algorithm where the temperature change duringthe iterations. The temperature-change is controlled by a function, the so-called annealingschedule. Such a method is for example the Simulated Annealing (Gibbs Sampler [13], Metropolisalgorithm [24, 27]) or some deterministic schemes such as Modi�ed Metropolis Dynamics [21, 23].For these algorithms, we introduce a new annealing schedule: the Multi-Temperature Annealing


5.1 Parallel Relaxation Scheme 15

S

S

S

S

T

T

T

T

0

0

1

1

2

2

33

updating sets:

Figure 9: Relaxation scheme on the pyramid. The levels connected by pointers are updated at the same

time.

(MTA). The idea is to associate di�erent temperatures to di�erent levels in the pyramid. Forthe cliques siting between two levels, we use either the temperature of the lower level or thehigher level (but once chosen, we always keep the same level throughout the algorithm). Forthe parallelization [2], we use the same coding technique as in the previous case.

We have three ways of annealing. The �rst two are well known [24], they require no mod-i�cation of the original algorithm, except that we work on a pyramid instead of a rectangularshape. The third one is a new annealing schedule which is the most e�cient with the hierarchicalmodel:

1. Homogeneous annealing: We assigne to each level of the pyramid the same, initially high,temperature. The relaxation is performed with this �xed temperature until an equilibriumis reached (i.e. until the change of the energy function associated with the model is lessthan a threshold). The temperature is then lowered. The algorithm is stopped at a lowtemperature closed to 0. This algorithm can be described by a sequence of homogeneousMarkov chains which are generated at a �xed temperature. The temperature will bedecreased in between subsequent Markov chains.

2. Inhomogeneous annealing: The same initially high temperature is assigned to each level,however the temperature is now lowered after each transition. In this case, the algorithm



W

H

W/w

w

h

Figure 10: Memory complexity of the hierarchical modelFigure 11: Communication

scheme of the hierarchical model

is described by an inhomogeneous Markov chain with temperature decreased in betweensubsequent transitions .

3. Multi-Temperature Annealing (MTA): To the higher levels, we associate higher tempera-tures which enable the algorithm to be less sensitive to local minima. However at a �nerresolution, the relaxation is performed at a lower temperature (at the bottom level, it isclosed to 0).

In all cases, the �nal con�guration of the �nest level is taken as the solution of the problem.

5.2 Complexity

In this section, we study the complexity of the optimization of the hierarchical model interms of the required memory (or number of processors in the parallel implementation) and therequired communication compared to the monogrid model.

Memory/processor: We refer to the notations of the Section 3: let us suppose that ourimage is of the size W � H . Following the procedure described in Section 3.1, we generate apyramid containing M + 1 levels. Without loss of generality, we can assume that W=w � H=h,where w � h is the block size, both w and h are greater than or equal to two. The hierarchicalmodel requires a maximum of (1+1=w)WH processors (cf. Equation (37)), since all levels mustbe stored at the same time. The memory (or processors) required for the storage of these levels(see Figure 10), considering a rectangular shape, is given by:

WH +WH

wh+

WH

(wh)2+ � � �+ WH

(wh)M= WH

MXi=0

1

(wh)i<

�1 +

1

w

�WH (37)


5.3 Convergence Study 17

Communication: Considering only the �rst and second order cliques (mostly used in practice,see Figure 11), it is clear that we have (wh+1) more communications per processors. Each siteinteracts with its ancestor (there is one) and its descendants (there are wh).

It turns out that the new model demands more processors and more computer time. However,as we can see later, experiments show that the new interaction is a better way to communicatebetween the grids yielding faster convergence (with respect to the number of iterations) for thestochastic relaxation algorithms and giving estimates which are closer to the global optimumfor deterministic as well as for stochastic relaxation schemes.

5.3 Convergence Study

The mathematical model of the Simulated Annealing (SA) was extensively studied by Aarts andvan Laarhoven in [24]. We brie y describe this model considering the Metropolis algorithm (forthe Gibbs sampler, see [13]):

The SA generates a sequence of con�gurations which constitutes a Markov chain. P!;�(k �1; k) is the probability that the con�guration obtained after k transition is � given the previouscon�guration is ! (both � and ! 2 �). Furthermore, let X(k) denotes the state reached afterthe kth transition. The probability of this event is given by:

P (X(k) = !) =X�

P (X(k� 1) = �)P�;!(k� 1; k) k = 1; 2; : : : (38)

If the transition probability P!;�(k � 1; k) does not depend on k, the corresponding chain ishomogeneous, otherwise it is inhomogeneous. The transition probabilities depend also on thetemperature parameter T . Thus, if T is kept constant, the chain will be homogeneous and thetransition matrix P = P (T ) can be written as:

P!;�(T ) =

(G!;�(T )A!;�(T ) 8� 6= !

1�P� G!;�(T )A!;�(T ) � = !(39)

Where G!;�(T ) is the generation probability of generating � from ! and A!;�(T ) is theacceptance probability of con�guration �, once it has been generated from !. It is clear from thede�nition in Equation (39) that P (T ) is a stochastic matrix:

8! :X�

P!;�(T ) = 1 (40)

In the original formulation of the algorithm, G!;�(T ) is given by the uniform distribution on thecon�gurations � which di�ers from ! only for one component. A(T ) is given by the Metropoliscriterion:

A!;�(T ) = min(1; exp(�(U(�)� U(!))=T )) (41)

where U(!) is the cost or energy function.

The SA algorithm obtains a global minimum if, after K transitions (K su�ciently large),the following holds:

P (X(K) 2 opt) = 1 (42)



where opt is the set of globally minimal con�gurations. In the following sections, we give theconditions on the matrices A(T ) and G(T ) described by Aarts and van Laarhoven in [24] toensure the convergence:

limT&0

( limK!1

P (X(K) 2 opt)) = 1 (43)

5.3.1 Homogeneous Annealing

In this case, we associate the same temperature T to each level of the pyramid. Then con�gu-rations are generated at this �xed temperature until equilibrium is reached. What does it meanfrom the point of view of the generated sequence of states? They constitute an homogeneousMarkov chain with transition matrix given by Equation (39). After the equilibrium is reached,the temperature is lowered and con�gurations are generated at this temperature etc: : :

Thus, the algorithm is described by a sequence of homogeneous Markov chains. Each chainis generated at a �xed value of T and T is decreased in between subsequent chains. Therefore,the SA algorithm on the pyramid is identical to its monogrid version and we can apply the sameproof [24]:

First, conditions are given on the existence of the stationary distribution of the chain, thenfurther conditions are imposed to assure the convergence of the stationary distribution.

Theorem 5.1 (Feller) The stationary distribution q of a �nite homogeneous Markov chainexists if the Markov chain is irreducible and aperiodic. Furthermore, the vector q is uniquelydetermined by the following equations:

8! : q! > 0;X!

q! = 1; (44)

8! : q! =X�

q�P�;!: (45)

De�nition 5.1 (Irreducibility) A Markov chain is irreducible i� for all pairs of con�g-urations (!; �) there is a positive probability of reaching � from ! in a �nite number oftransitions:

8!; � 9n : 1 � n <1^ (Pn)!;� > 0: (46)

De�nition 5.2 (Aperiodicity) A Markov chain is aperiodic i� for all con�gurations ! 2 ,the greatest common divisor of all integers n � 1, such that (Pn)!;! > 0 is equal to 1.

Since 8!; �; T > 0 : A!;�(T ) > 0 (see Equation (41)), and 8!; �(! 6= �); T > 0 : G!;�(T ) > 0(G(T ) is uniform), irreducibility is satis�ed. To establish aperiodicity, the following theorem isused:

Theorem 5.2 An irreducible Markov chain is aperiodic if the following condition is satis�ed:

8T > 0 : 9!T 2 : P!T ;!T (T ) > 0: (47)



Thus, for aperiodicity it is su�cient to assume that

8T > 0 : 9!T ; �T 2 : A!T ;�T < 1 (48)

which is always satis�ed by setting, for all T > 0, !T 2 opt; �T 62 opt.

Now, further conditions are imposed to ensure the convergence of the stationary distribution(for more details, see [24]):

1: 8!; � 2 : G�;! = G!;� (49)

2: 8!; �; � 2 : U(!) � U(�) � U(�)) A!;�(T ) = A!;�(T )A�;�(T ) (50)

3: 8!; � 2 : U(!) � U(�)) A!;�(T ) = 1 (51)

4: 8!; � 2 ; T > 0 : U(!) < U(�)) A!;�(T ) < 1 (52)

5: 8!; � 2 : U(!) < U(�)) limT&0A!;�(T ) = 0 (53)

It is clear, that the condition 1: is satis�ed in our case, since G(T) is uniform, its diagonal is 0and the other elements are equals. The conditions 2:� 5: are implied by the de�nition of A(T )in Equation (41). These conditions are su�cient but not necessary to ensure the convergence.In this case, the stationary distribution is given by:

q!(T ) =exp(�(U(!)� Uopt)=T )P�2 exp(�(U(!)� Uopt)=T )

(54)

5.3.2 Inhomogeneous Annealing

In this case, we associate the same temperature T to each level. Then con�gurations are gen-erated such that after each transition, T is lowered. The obtained sequence of con�gurationsconstitutes an inhomogeneous Markov chain whose transition matrix P (k � 1; k) is de�ned by:

P!;�(k� 1; k) =

(G!;�(Tk)A!;�(Tk) 8� 6= !1�P� 6=! G!;�(Tk)A!;�(Tk) � = !

(55)

As previously, the annealing scheme on the pyramid is analogous to the monogrid case, thus thesame proof [24] is applied :

Since T is changed in between subsequent transitions, the conditions of the convergence notonly relate to the matrices G(Tk) and A(Tk) but also impose restrictions on the way of changingthe control parameter T . The following assumption are made:

1: limk!1 Tk = 0 (56)

2: Tk � Tk+1; k = 0; 1 : : : (57)

De�nition 5.3 (Weak ergodicity) An inhomogeneous Markov chain is weakly ergodic iffor all m � 1; !; �; � 2 :

limk!1

(P!;�(m; k)� P�;�(m; k)) = 0 (58)

where the transition matrix P (m; k) is de�ned by:

P!;�(m; k) = P (X(k) = � j X(m) = !): (59)



De�nition 5.4 (Strong ergodicity) An inhomogeneous Markov chain is strongly ergodicif there exists a vector �, satisfying:X

!

�! = 1; 8! : �! > 0; (60)

such that for all m � 1; !; � 2 :

limk!1

P!;�(m; k) = ��: (61)

Theorem 5.3 (Seneta) An inhomogeneous Markov chain is weakly ergodic i� there is astrictly increasing sequence of positive numbers kl such that

1Xl=1

(1� �1(P (kl; Kl+1))) =1; (62)

where �1(P ), the coe�cient of ergodicity of P , is de�ned as

�1(P ) = 1�min!;�

X�

min(P!;� ; P�;�): (63)

Theorem 5.4 (Isaacson and Madsen) An inhomogeneous Markov chain is strongly er-godic if it is weakly ergodic and if for all k there exists a vector �(k) such that �(k) is aneigenvector with eigenvalue 1 of P (k � 1; k),

P! j �!(k) j= 1 and

1Xk=1

X!2

j �!(k)� �!(k + 1) j<1: (64)

Moreover, if � = limk!1 P!;�(m; k), then � is the vector in De�nition 5.4 satisfying Equa-tion (61).

Under the assumptions of the previous section on A(T ) and G(T ), there exists an eigenvectorq(Tk) of P (k�1; k) for each k � 0, namely the stationary distribution of the homogeneous Markovchain. Using Theorem 5.4 with �(k) = q(Tk), strong ergodicity can be proved by showing thatthe following conditions are satis�ed:

1. The Markov chain is weakly ergodic

2. The q(Tk) satisfy Equation (64).

This was proved by Geman and Geman in [13]. They show, that if

Tk � N�

log k; (65)

� = max!2

U(!)�min!2

U(!) (66)

then Equation (62) is satis�ed and thus weak ergodicity is obtained.



5.3.3 Multi-Temperature Annealing

The main purpose and study of this section is a new Multi-Temperature Annealing (MTA)schedule. We associate di�erent temperatures to di�erent levels of the pyramid such that thetemperature depends not only on the iteration but also on the clique. In this case, the con�gura-tions are generated at di�erent temperatures at di�erent sites. The temperature is then loweredafter each transition according to the MTA schedule (see Theorem 5.5). More generally, we havethe following problem:

Let S = fs1; : : : ; sNg be a set of sites, G some neighborhood system with cliques C andX a MRF over these sites with energy function U . We de�ne an annealing scheme where thetemperature T depends on the iteration k and on the cliques C. Let � denotes the followingoperation:

P (X = !) = �T (k;C)(!) = exp(�U(!)� T (k; C)) (67)

where U(!)� T (k; C) =XC2C

VC(!)

T (k; C): (68)

Let us suppose that the sites are visited for updating in the order fn1; n2; : : :g � S. The resultingstochastic process is denoted by fX(k); k = 0; 1; 2; : : :g, where X(0) is the initial con�guration.X(k) is an inhomogeneous Markov chain with transition matrix:

P!;�(k � 1; k) =

(G!;�(T (k; C))A!;�(T (k; C)) 8� 6= !1�P� 6=! G!;�(T (k; C))A!;�(T (k; C)) � = !

(69)

Considering the Gibbs sampler, the generation matrix G!;�(T (k; C)) and acceptation matrixA!;�(T (k; C)) is given by:

G!;�(T (k; C)) = G!;�(k) =

(1; if � = !j!nk=� for some� 2 �

0; otherwise(70)

A!;�(T (k; C)) = �T (k;C)(�) (71)

The transition matrix at time k is then of the following form:

P!;�(k) = G!;�(k)A!;�(T (k; C)) =

(�T (k;C)(�); if � = !j!nk=� for some � 2 �

0; otherwise(72)

Let opt be the set of globally optimal con�gurations:

opt = f! 2 : U(!) = min�

U(�)g (73)

Let �0 be the uniform distribution on opt, and de�ne:

U sup = max!

U(!);

U inf = min!

U(!);

� = U sup � U inf : (74)

The following theorem gives an annealing schedule, basically the same as in [13]. However,the temperature here is a function of k and C 2 C. .4


22 6 Application to Supervised Image Classi�cation

Theorem 5.5 (Multi-Temperature Annealing) Assume that there exists an integer � �N such that for every k = 0; 1; 2; : : :, S � fnk+1; nk+2; : : : ; nk+�g. Let T (k; C) be anydecreasing sequence of temperatures for which

1. There exists T infk such that for all C 2 C: T inf

k � T (k; C).

2. limk!1 T (k; C) = 0

3. For all k � k0, for some integer k0 � 2: Tinf

k � N�= logk.

Then for any starting con�guration � 2 and for every ! 2 :

limk!1

P (X(k) = ! j X(0) = �) = �0(!): (75)

The �rst condition states the existence of a lower bound for the temperature function. Thesecond condition is that the temperature decreases as the system evolves. The third conditionassures that each \local" temperature decreases in an appropriate way, i.e. not too quickly. Theproof of this theorem can be found in Appendix A.

6 Application to Supervised Image Classi�cation

We have applied the proposed models to supervised image classi�cation. In this section, we givethe equations of the MRF models (monogrid, multiscale, hierarchical) used for this problem.

First, we give the equations for the monogrid model [21, 23]. Let us suppose that the observa-tions consist of the grey-levels. A very general problem is to �nd the labeling ! which maximizesP (! j F). Bayes theorem tells us that P (! j F) = 1

P (F) P (F j !)P (!). Actually P (F) doesnot depend on the labeling ! and we have the assumption that P (F j !) = Q

s2S P (fs j !s). Itis then easy to see that, under some independence assumption [5], the global labeling which weare trying to �nd is given by:

! = max!2

Ys2S

P (fs j !s)YC2C

exp(�VC(!C)) : (76)

It is obvious from this expression that the a posteriori probability also derives from a MRF. Theenergies of cliques of order 1 directly re ect the probabilistic modeling of labels without context,which would be used for labeling the pixels independently. Let us assume that P (fs j !s) isGaussian, the class � 2 � is represented by its mean value �� and its deviation ��. We getthe following energy function (see Equation (17)):

U1(!;F) =Xs2S

log(

p2��!s) +

(fs � �!s)2

2�2!s

!(77)

and U2(!) =XC2C

V2(!C) (78)

where V2(!C) = Vfs;rg(!s; !r) =

(�� if !s = !r+� if !s 6= !r

(79)


7 Experimental Results 23

We can easily extend these equations to the multiscale model using Equations (21) and (22).For simplicity, the block size is supposed to be n�n (that is w = h = n). Then, we get [19, 20]:

U i1(�

i;F) =Xsi2Si

V i1 (�

isi ;F)

where V i1(�

isi ;F) =

Xs2bi

si

V1(!s; fs) +XC2Ci

si

V2(!C)

=Xs2bi

si

log(

p2��!s) +

(fs � �!s)2

2�2!s

!� pi� (80)

and U i2(�

i) =X

Ci=fri;sig2Ci

V i2 (�

iCi)

where V i2(�

iCi) =

Xfr;sg2D

Ci

V2(!r; !s) =

(�qi� if !r = !s+qi� if !r 6= !s

(81)

The values of pi and qi depend on the chosen block size and the neighborhood structure. pi is thenumber of cliques included in the same block at scale Bi and qi is the number of cliques betweentwo neighboring blocks at scale Bi. Considering blocks of n� n and a �rst order neighborhoodsystem, we get:

pi = 2ni(ni � 1) (82)

qi = ni (83)

In the case of the hierarchical model, we derive [19, 20] the following equations (using Equa-tions (35) and (36)):

�U1(�!;F) =MXi=0

Xsi2Si

V i1(�

i;F) (84)

and �U2(�!) =MXi=0

XCi2Ci

V i2(�

iCi) +

XC2 �C3

�V2(�!c) (85)

where �V2(�!c) = �Vf�s;�rg(�!�s; �!�r) =

(� if �!�s = �!�r+ if �!�s 6= �!�r

(86)

7 Experimental Results

We compare the Gibbs sampler [13] and Iterated Conditional Mode [4, 18] using three modelsfor each algorithm (original, multiscale and hierarchical). We have also compared the inhomoge-neous and MTA schedules. All tests have been conducted on a Connection Machine CM200 [17]with 8K processors. In the tables (see Appendix C), we give for each model and for each methodthe number of levels in the pyramid (for the monogrid model, this is 1), the Virtual Proces-sor Ratio (VPR) [17], the initial temperature (for the hierarchical model, this is not the sameat each level, using the new MTA schedule!), the number of iterations, the computing time,the error of the classi�cation (= the number of misclassi�ed pixels) and the parameter � (seeEquations (79), (80), (81)) and (see Equation (86)).


24 7 Experimental Results

100.0Nb. of iterations

0.0

74291.0

Global energy(T=1)

53421.4

238.0Nb. of iterations

0.0

93208.9

Global energy(T=1)

53415.4

Figure 12: Energy decrease with the MTA

schedule

Figure 13: Energy decrease with the inhomo-

geneous annealing schedule

7.1 Comparison of the Schedules

In Figure 14 we compare the inhomogeneous and MTA schedules on a noisy synthetic imageusing the Gibbs sampler. In both cases, the parameters were strictly the same, the only di�erenceis the applied schedule: the pyramid contains 4 levels yielding a VPR equals to 4. The initialtemperature were respectively 4 (at the highest level), 3, 2 and 1 (at the lowest level). Thepotential � equals to 0:7 and equals to 0:1. In Figure 13 (resp. 12), we show the global energy(computed at a �xed temperature) versus the number of iterations of the inhomogeneous (resp.MTA) schedule. Both reach pratically the same minimum (53415.4 for the inhomogeneous and53421.4 for the MTA), however the inhomogeneous schedule requires 238 iterations (796:8 sec.CPU time) but the MTA schedule requires only 100 iterations (340:6 sec. CPU time) for theconvergence.

7.2 Comparison of the Models

First, we have tested the models on noisy synthetic images of size 128 � 128. The �rst oneis a checkerboard image (see Figure 16, Table 1) with 2 classes and a SNR equals to �5dB.Note that the multiscale model gives better results than the monogrid one, especially with theICM algorithm. This type of image is well adapted for the multiscale model, because the imagehas a rectangular structure similar to the model itself. The hierarchical model gives the bestresults with both algorithms but the computing time is quite greater than for the multiscale ormonogrid model. The reason is that, in the hierarchical case, the whole pyramid is stored at the


7.2 Comparison of the Models 25

Noisy image (SNR = 3dB) Inhomogeneous schedule MTA schedule

Figure 14: Results of the Gibbs sampler on a synthetic image with inhomogeneous and MTA schedules

same time yielding a greater VPR ratio. On the other hand, we cannot use the fast \NEWS"communication scheme [17] as in the other cases.

In the second image, we have added di�erent geomet-

255.0Grey-levels

0.0

1.0

Density

Figure 15: Histogram of the SPOT im-

age with 6 classes

rical forms (circle and triangle) to the checkerboard im-age(see Figure 17, Table 2). In this case, we studied thegeometrical sensitivity of the models. The Gibbs samplergives nearly the same result in all cases. However theICM is more sensitive. The multiscale model gives betterresult than the monogrid one but the result is not �ne inthe triangle and the circle. These forms have a di�erentstructure than the block structure of the model, the ini-tialisation was wrong in these regions and the ICM wasnot able to correct these errors. In the hierarchical case,instead of the initialisation, we have a real time commu-nication between the levels which is able to give resultsclosed to the ones obtained with the Gibbs sampler. Ofcourse, this model requires more computing time than theother ones.

The third image is a checkerboard image with 16 classes (see Figure 18 and Table 3). ForICM, there is a signi�cant improvement but for the Gibbs Sampler, we observe only a slightimpovement for the multiscale and hierarchical cases.

In Figures 19, 20 and 21, we show some real images of size 256� 256: a SPOT image with4 classes (see Figure 19, Table 4), an indoor scene with 4 classes (see Figure 20) and a medicalimage with 3 classes (see Figure 21).

Finally, we present a SPOT image of size 512� 512 (see Figure 22) with ground truth data(see Figure 23). In the following table, we give the mean (�) and the deviation (�2) for eachclass (we have 6 classes):


26 8 Conclusion

class 1 2 3 4 5 6

� 65.3 81.3 75.4 98.5 82.5 129.0

�2 6.4 12.7 14.9 16.8 9.46 183.2

As we can see, the classes 2 and 5 have nearly the same parameters, it is di�cult to distinguishbetween them. Figure 15 shows the histogram of the original image. We can clearly distin-guish three peak (about at 64, 80, and 120). Figure 24 (resp. Figure 25) shows the resultsobtained with the ICM (resp. Gibbs Sampler). For these results, we give a map drawn byan expert (ground truth data, see Figure 23). The classes 1: � 6: correspond to the regionsB3c; B3b; B3d; a2; hc and 92a on the map. For the hierarchical model a slight improvement canbe noticed for the results of the Gibbs sampler, hovewer, for the ICM, the improvement is moresigni�cant. In Table 5 we give the parameters and the computing time for each model and eachmethod.

8 Conclusion

In this report, we have presented a classical multiscale model and proposed a new hierarchicalMRF model based on the classical one. We have introduce a new interaction scheme between twoneighbor grids in the label pyramid. The new connections allow to propagate local interactionsmore e�ciently yielding faster convergence (w.r.t. the number of iterations)in many cases andgiving estimates closer to the optimum for deterministic as well as for stochastic relaxationtechniques. On the other hand, these interactions make the model more complex, which demandscomputationally more expensive algorithms.

We have proposed also a new annealing scheme, the Multi-Temperature Annealing, whichis especially well adapted to the minimization of the energy function of the hierarchical model.This algorithm can be run in parallel on the entire pyramid and usually decreases the computingtime compared to the classical schemes. A generalization of the annealing theorem of Gemanand Geman [13] has been proposed, which gives a theoretical background for the convergence ofthis method towards global optimum.

Finally, the hierarchical model as well as the theoretical study given in this report are quitegeneral. They have been adapted for supervised image classi�cation but they can be used forother low level vision tasks such as edge detection, image restoration, data fusion, motion etc: : : .We are currently working on the parameter estimation of these models for unsupervised imageclassi�cation.


8 Conclusion 27

Appendix


28 A Proof of The Multi-Temperature-Annealing Theorem

A Proof of The Multi-Temperature-Annealing Theorem

We follow the proof of the annealing theorem given by Geman and Geman in [13]. Essentially,we can apply the same proof, only a slight modi�cation is needed.

A.1 Notation

We recall a few notations used previously in this report: S = fs1; : : : ; sNg denotes the set ofsites, � = f0; 1; : : : ; L � 1g is a common state space and !; �; �0 : : : 2 denote con�gurationsof the label �eld, where = �N is �nite. The sites are updated in the order fn1; n2; : : :g � S.The generated con�gurations constitute an inhomogeneous Markov chain fX(k); k = 0; 1; 2; : : :g,where X(0) is the initial con�guration. The transition X(k � 1) ! X(k) is controlled by theGibbs distribution �T (k;C) according to the transition matrix at time k:

P!;�(k) = G!;�(k)A!;�(T (k; C)) =

(�T (k;C)(�); if � = !j!nk=� for some � 2 �

0; otherwise(87)

where

�T (k;C)(!) = exp(�U(!)� T (k; C)) (88)

with U(!)� T (k; C) =XC2C

VC(!)

T (k; C): (89)

We use also the following de�nitions:

U sup = max!

U(!);

U inf = min!

U(!);

� = U sup � U inf : (90)

Given any starting distribution �0, the distribution ofX(k) is given by the vector �0�Qki=1 P (k):

P�0(X(k) = !) =

�0 �

kYi=1

P (k)

!��!

(91)

=X�

P (X(k) = !jX(0) = �)�0(�) (92)

We use the following notation for transitions: 8l < k and !; � 2 :

P (k; !jl; �) = P (X(k) = !jX(l) = �);

and for any distribution � on :

P (k; !jl; �) =X�

P (X(k) = !jX(l) = �)�(�):

Sometimes, we use this notation as P (k; �jl; �), where \�" means any con�guration from .Finally, let k�� k denotes the following distance between two distributions on :

k�� k =X!

j�(!)� �(!)j :

It is clear, that limn!1 �n = � in distribution (i.e. 8! : �n(!) ! �(!)) if and only ifk�n � �k ! 0.


A.2 Proof of the Theorem 29

A.2 Proof of the Theorem

First, we state two lemmas which imply Theorem 5.5:

Lemma A.1 For every k0 = 0; 1; 2 : : ::

limk!1

sup!;�0;�00

jP (X(k) = !jX(k0) = �0)� P (X(k) = !jX(k0) = �00)j = 0:

Proof of Lemma A.1:

Fix k0 = 0; 1; 2; : : :, de�ne Kl = k0 + l�, l = 0; 1; 2; : : :, where � is the number of transitionsnecessary for a full sweep of S (for every k = 0; 1; 2; : : :: S � fnk+1; nk+2; : : : ; nk+�g). Let �(k)be the smallest probability among the local characteristics (see De�nition 2.3):

�(k) = inf1�i�N

!2

�T (k;C)(Xsi = !si jXsj = !sj ; j 6= i):

A lower bound for �(k) is given by:

�(k) � exp(�U sup � T (k; C))

L exp(�U inf � T (k; C))=

exp(�� T (k; C))

L� 1

Lexp(��=T inf

k ) :

Now �x l and de�ne mi as the time of the last replacement of site si before Kl+1 (that is beforethe lth full sweep):

81 � i � N : mi = supfk : k � Kl; nk = sig:Without loss of generality, we can assume that m1 > m2 � � � > mN (otherwise relabel the sites).Then:

P (X(Kl) = !jX(Kl�1) = !0)

= P (Xs1(m1) = !s1 ; Xs2(m2) = !s2 ; : : : ; XsN (mN) = !sN jX(Kl�1) = !0)

=NYi=1

P (Xsi(mi) = !si jXsi+1(mi+1) = !si+1 ; : : : ; XsN (mN) = !sN ; X(Kl�1) = !0)

�NYi=1

�(mi) � L�NNYi=1

exp(��=T infmi

) � L�N exp

0@� �N

T infk0+l�

1A (93)

since mi � Kl = k0 + l�; i = 1; 2 : : : ; N and T infk is decreasing. If k0 + l� is su�ciently large,

Equation (93) can be continued as:

P (X(Kl) = !jX(Kl�1) = !0) � L�N (k0 + l�)�1:

For a su�ciently small constant � (0 < � � 1), we can assume that

inf!;!0

P (X(Kl) = !jX(Kl�1) = !0) � �L�N

k0 + l�(94)

for every k0 = 0; 1; 2; : : : and l = 1; 2; : : :, keeping in mind that Kl depends on k0.



Consider now the limit given in lemma A.1 and for each k > k0, de�ne Ksup(k) = supfl :

Kl < kg (the last sweep before the kth transition) so that limk!1Ksup(k) =1. Fix k > K1:

sup!;�0;�00

��P (X(k) = !jX(0) = �0)� P (X(k) = !jX(0) = �00)��

= sup!

sup�P (X(k) = !jX(0) = �)� inf

�P (X(k) = !jX(0) = �)

!

= sup!

sup�

X!0

P (X(k) = !jX(K1) = !0)P (X(K1) = !0jX(0) = �)

� inf�

X!0

P (X(k) = !jX(K1) = !0)P (X(K1) = !0jX(0) = �)

!:= sup

!Q(k; !):

Furthermore, for each ! 2 :

sup�

X!0

P (X(k) = !jX(K1) = !0)P (X(K1) = !0jX(0) = �)

� sup�

X!0

P (X(k) = !jX(K1) = !0)�(!0);

where � is any probability measure on . Using Equation (94), we get:

�(!0) � �L�N

k0 + l�:

Suppose that P (X(k) = !jX(K1) = !0) is maximized at !0 = !sup and minimized at !0 = !inf .Then we get:

sup�

X!0

P (X(k) = !jX(K1) = !0)�(!0) � 1� (LN � 1)

�L�N

k0 + l�

!P (X(k) = !jX(K1) = !sup)

+�L�N

k0 + l�

X!0 6=!sup

P (X(k) = !jX(K1) = !0)

| {z }P (X(k)=!jX(K1)=!inf )+

P!0 6=!sup;!inf

P (X(k)=!jX(K1)=!0)

;

and in a similar way:inf�

X!0

P (X(k) = !jX(K1) = !0)�(!0) � 1� (LN � 1)

�L�N

k0+ l�

!P (X(k) = !jX(K1) = !inf )

+�L�N

k0 + l�

X!0 6=!inf

P (X(k) = !jX(K1) = !0)

| {z }P (X(k)=!jX(K1)=!sup)+

P!0 6=!sup;!inf

P (X(k)=!jX(K1)=!0)

:



Then, it is clear that

Q(k; !) ��1� �

k0 + l�

��P (X(k) = !jX(K1) = !sup)� P (X(k) = !jX(K1) = !inf )

�;

hence:sup

!;�0;�00

��P (X(k) = !jX(0) = �0)� P (X(k) = !jX(0) = �00)��

�1� �

k0 + l�

�sup

!;�0;�00

��P (X(k) = !jX(K1) = �0)� P (X(k) = !jX(K1) = �00)��

�1� �

k0 + l�

� �1� �

k0 + l�

�sup

!;�0;�00

��P (X(k) = !jX(K2) = �0)� P (X(k) = !jX(K2) = �00)��!

Proceeding this way, we have the following bound:

�Ksup(k)Yk=1

�1� �

k0 + l�

�sup

!;�0;�00

��P (X(k) = !jX(KKsup(k)) = �0)

�P (X(k) = !jX(KKsup(k)) = �00)��

and �nally, since the the possible maximal value of the supremum is 1:

sup!;�0;�00

jP (X(k) = !jX(0) = �0)� P (X(k) = !jX(0) = �00)j �Ksup(k)Yk=1

�1� �

k0 + l�

�:

It is then su�cient to show that

limm!1

mYk=1

�1� �

k0 + l�

�= 0:

which is a well known consequence of the divergence of the seriesXl

(k0 + l�)�1

for all k0 and �. This completes the proof of Lemma A.1. Q.E.D.

Lemma A.2lim

k0!1supk�k0

k P (k; �jk0; �0)� �0 k= 0:

Proof of Lemma A.2:In the following, let Pk0;k(�) stand for P (k; �jk0; �0), so that for any k � k0 > 0:

Pk0;k(!) =X�

P (X(k) = !jX(k0) = �)�0(�):

First, we show that for any k > k0 � 0:

kPk0;k � �T (k;C)k � kPk0 ;k�1 � �T (k;C)k: (95)



We can assume for convenience that nk = s1. Then

kPk0;k � �T (k;C)k =X(!s1 ;:::!sN )

��T (k;C)(Xs1 = !s1 jXs = !s; s 6= s1)Pk0;k�1(Xs = !s; s 6= s1)

��T (k;C)(Xs = !s; s 2 S)��

=X

(!s2 ;:::!sN )

0@ X!s12�

�T (k;C)(Xs1 = !s1 jXs = !s; s 6= s1) jPk0 ;k�1(Xs = !s; s 6= s1)

��T (k;C)(Xs = !s; s 6= s1)��

=X

(!s2 ;:::!sN )

��Pk0;k�1(Xs = !s; s 6= s1)� �T (k;C)(Xs = !s; s 6= s1)��

=X

(!s2;:::!sN )

��X!s1

(Pk0;k�1(Xs = !s; s 2 S)� �T (k;C)(Xs = !s; s 2 S))��

�X

(!s1;:::!sN )

��Pk0;k�1(Xs = !s; s 2 S)� �T (k;C)(Xs = !s; s 2 S)��

= kPk0;k�1 � �T (k;C)k:Second, we prove that �T (k;C) converges to �0 (the uniform distribution on opt):

limk!1

k�0 � �T (k;C)k = 0:

To see this, let joptj be the number of globally optimal con�gurations. Then

limk!1

�T (k;C)(!) = limk!1

exp(�U(!)� T (k; C))P!02opt

exp(�U(!0)� T (k; C)) +P

!0 62optexp(�U(!0)� T (k; C))

= limk!1

exp(�(U(!)� U inf )� T (k; C))

joptj+P!0 62optexp(�(U(!)� U inf )� T (k; C))

=

(0 ! =2 opt1

joptj! 2 opt

(96)

Finally, we can prove that1Xk=1

�T (k;C) � �T (k+1;C)

<1 (97)

since1Xk=1

�T (k;C) � �T (k+1;C)

=X!

1Xk=1

��T (k;C)(!)� �T (k+1;C)(!)��

and since8! : �T (k;C)(!) �! �0(!);

it is enough to show that �T (!) is monotonous for every !. However it is clear from Equation (96)that



� if ! =2 opt then �T (!) is strictly increasing for 0 < T � � for some su�ciently small �,

� if ! 2 opt then �T (!) is strictly decreasing for all T > 0.

Fix k > k0 � 0. From Equations (95) and (97), we obtain:

kPk0;k � �0k � kPk0;k � �T (k;C)k+ k�T (k;C)� �0k � kPk0;k�1 � �T (k;C)k+ k�T (k;C)� �0k by (95)

� kPk0;k�1 � �T (k�1;C)k+ k�T (k�1;C) � �T (k;C)k+ k�T (k;C) � �0k� kPk0;k�2 � �T (k�2;C)k+ k�T (k�2;C) � �T (k�1;C)k+ k�T (k�1;C) � �T (k;C)k+ k�T (k;C) � �0k

� � � � � kPk0;k0 � �T (k0;C)k+k�1Xl=k0

k�T (l;C) � �T (l+1;C)k+ k�T (k;C) � �0k:

On the other hand,Pk0;k0 = �0

andlimk!1

k�T (k;C) � �0k = 0:

Then we have

limk0!1

supk�k0

kPk0;k � �0k � limk0!1

supk>k0

k�1Xl=k0

k�T (l;C) � �T (l+1;C)k

= limk0!1

1Xl=k0

k�T (l;C) � �T (l+1;C)k = 0

The last term is 0 by (97) which completes the proof of Lemma A.1. Q.E.D.

Theorem 5.5 (Multi-Temperature Annealing) Assume that there exists an integer � �N such that for every k = 0; 1; 2; : : :, S � fnk+1; nk+2; : : : ; nk+�g. Let T (k; C) be anydecreasing sequence of temperatures for which

1. There exists T infk such that for all C 2 C: T inf

k � T (k; C).

2. limk!1 T (k; C) = 0

3. For all k � k0, for some integer k0 � 2: T infk � N�= logk.


limk!1

P (X(k) = !jX(0) = �) = �0(!): (98)



Proof of the theorem:Using the above mentionned lemmas, we can easily prove the annealing theorem:

limk!1

kP (X(k) = �jX(0) = �)� �0k = limk0!1

limk!1

k�k0

kX�0

P (k; �jk0; �0)P (k0; �0j0; �)� �0k

� limk0!1

limk!1

k�k0

kX�0

P (k; �jk0; �0)P (k0; �0j0; �)� P (k; �jk0; �0)k

+ limk0!1

limk!1

k�k0

kP (k; �jk0; �0)� �0k :

The last term is 0 by Lemma A.2. Moreover, P (k0; �j0; �) and �0 have total mass 1, thus:

kX�0

P (k; �jk0; �0)P (k0; �0j0; �)� P (k; �jk0; �0)k

=X!

sup�00jX�0

(P (k; !jk0; �0)� P (k; !jk0; �00))(P (k0; �0j0; �)� �0(�0))j

� 2X!

sup�0;�00

jP (k; !jk0; �0)� P (k; !jk0; �00)j :

Finally,limk!1

kP (X(k) = �jX(0) = �)� �0k

� 2X!

limk0!1

limk!1

k�k0

sup�0;�00

jP (k; !jk0; �0)� P (k; !jk0; �00)j = 0

The last term is 0 by Lemma A.1 which completes the proof of the annealing theorem. Q.E.D.


B Images 35

B Images


36 B Images

ICM with monogrid model

Gibbs with monogrid model

Original image

ICM with multiscale model

Gibbs with multiscale model

Noisy image (SNR = �5dB)

ICM with hierarchical model

Gibbs with hierarchical model

Figure 16: Results on a synthetic image with 2 classes.


B Images 37



Original image



Noisy image (SNR = 3dB)





38 B Images



Original image



Noisy image (SNR = 10dB)





B Images 39





Original image



Figure 19: Results on a SPOT image with 4 classes.


40 B Images





Original image



Figure 20: Results on an indoor scene with 4 classes.


B Images 41





Original image



Figure 21: Results on a medical image with 4 classes.


42 B Images

Figure 22: Original SPOT image with 6 classes.


B Images 43

Figure 23: Ground truth data.


44 B Images

Monogrid model Ground truth data

Multiscale model Hierarchical model

Figure 24: Results of the ICM algorithm. Comparison with ground truth data.


B Images 45

Monogrid model Ground truth data

Multiscale model Hierarchical model

Figure 25: Results of the Gibbs Sampler. Comparison with ground truth data.


46 C Tables

C Tables


C Tables 47

original levels VPR T0 iter. total time time/iter. error �

Gibbs 1 2 4 62 1.91 sec. 0.03 sec. 260 (1.59%) 0.9 |ICM 1 2 1 8 0.077 sec. 0.009 sec. 1547 (9.44%) 0.9 |

multiscale | | | | | | | | |

Gibbs 4 1,2 4 136 3.25sec. 0.02 sec. 236 (1.44%) 0.7 |ICM 4 1,2 1 18 0.14 sec. 0.008 sec. 465 (2.83%) 0.7 |

hierarchical | | | | | | | | |

Gibbs 4 4 4,3,2,1 23 50.1 sec. 2.18 sec. 115 (0.7%) 0.7 0.3ICM 4 4 1 11 16.6 sec. 1.5 sec. 300 (1.83%) 0.7 0.3

Table 1: Results on a noisy synthetic image with 2 classes


Gibbs 1 2 4 68 3.01 sec. 0.04 sec. 183 (1.12%) 1.0 |ICM 1 2 1 9 0.15 sec. 0.02 sec. 2948 (17.99%) 1.0 |

multiscale | | | | | | | | |Gibbs 4 1,2 4 101 3.85sec. 0.04 sec. 176 (1.07%) 1.0 |ICM 4 1,2 1 17 0.22 sec. 0.01 sec. 1657 (10.11%) 0.9 |

hierarchical | | | | | | | | |Gibbs 4 4 4,3,2,1 41 141.97 sec. 3.46 sec. 191 (1.16%) 0.7 0.1ICM 4 4 1 11 30.17 sec. 2.74 sec. 293 (1.78%) 0.8 0.5



Gibbs 1 2 4 201 22.96 sec. 0.12 sec. 340 (2.08%) 1.0 |

ICM 1 2 1 10 0.55 sec. 0.05 sec. 8721 (53.22%) 1.0 |

multiscale | | | | | | | | |Gibbs 4 1,2 4 337 32.97sec. 0.1 sec. 331 (2.02%) 1.0 |%hline ICM 4 1,2 1 15 0.68 sec. 0.05 sec. 5198 (31.73%) 1.0 |

hierarchical | | | | | | | | |

Gibbs 4 4 4,3,2,1 107 1169.46 sec. 10.93 sec. 316 (1.93%) 1.0 0.2ICM 4 4 1 16 162.87 sec. 10.18 sec. 795 (4.85%) 1.0 0.5



48 C Tables

original levels VPR T0 iter. total time time/iter. �

Gibbs 1 8 4 64 9.06 sec. 0.14 sec. 2.0 |ICM 1 8 1 7 0.33 sec. 0.047 sec. 2.0 |

multiscale | | | | | | | |Gibbs 4 1-8 4 106 10.22 sec. 0.09 sec. 2.0 |ICM 4 1-8 1 37 1.14 sec. 0.03 sec. 1.0 |

hierarchical | | | | | | | |Gibbs 4 16 4,3,2,1 29 353.54 sec. 12.19 sec. 2.0 0.4ICM 4 16 1 6 58.59 sec. 9.76 sec. 0.5 0.6

Table 4: Results on a SPOT image with 4 classes

original levels VPR T0 iter. total time time/iter. �

Gibbs 1 32 4 234 163.18 sec. 0.69 sec. 1.5 |ICM 1 32 1 8 2.03 sec. 0.25 sec. 1.5 |

multiscale | | | | | | | |Gibbs 5 1-32 4 580 180.17 sec. 0.31 sec. 1.5 |ICM 5 1-32 1 36 5.15 sec. 0.14 sec. 0.3 |

hierarchical | | | | | | | |Gibbs 5 64 4,3,2,1 154 9629.33 sec. 62.53 sec. 0.7 0.1ICM 5 64 1 16 915.99 sec. 57.25 sec. 1.0 0.2

Table 5: Results on the SPOT image with 6 classes


References 49

References

[1] R. Azencott. Markov �elds and image analysis. Proc. AFCET, Antibes, 1987.

[2] R. Azencott. Parallel Simulated Annealing : an overview of basic techniques, pages 37{46.Wiley, 1992.

[3] M. Berthod, G. Giraudon, and J. Stromboni. Determistic Pseudo-Annealing : a SuboptimalScheme for optimization in Markov Random Fields. An application to pixel classi�cation.In Proc. ECCV, Santa Margherita Ligure, Italy, May 1992.

[4] J. Besag. On the statistical analysis of dirty pictures. Jl. Roy. Statis. Soc. B., 1986.

[5] J. E. Besag. Spatial interaction and the statistical analysis of lattice systems (with discus-sion). Jl. Roy. Statis. Soc. B. 36., pages 192{236, 1974.

[6] G. Bilbro, W. Snyder, S. Garnier, and I. Gault. Mean �eld annealing : a formalism forconstructing GNC-like algorithms. IEEE Trans. on Neural Networks, 3(1), Jan. 1992.

[7] A. Blake and A. Zisserman. Visual reconstruction. MIT Press, Cambridge - MA, 1987.

[8] C. Bouman. A Multiscale Image Model for Bayesian Image Segmentation. Technical ReportTR-EE 91-53, Purdue University, 1991.

[9] C. Bouman and B. Liu. Multiple Resolution segmentation of Texture Images. IEEE Trans.on Pattern Analysis and Machine Intelligence, 13:99{113, 1991.

[10] B. Chalmond. Image restoration using an estimated markov model. Signal Processing,15:115{129, 1988.

[11] P. Chou and C. Brown. The theory and practice of bayesian image labeling. InternationalJournal of Computer Vision, 4:185{210, 1990.

[12] D. Geiger and F. Girosi. Parallel and deterministic algorithms for MRFs : surface recon-struction and integration. In Proc. ECCV90, pages 89{98, Antibes, France, 1990.

[13] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions and the Bayesianrestoration of images. IEEE Trans. on Pattern Analysis and Machine Intelligence, 6:721{741, 1984.

[14] S. Geman and C. Gra�gne. Markov Random Field Image Models and their application toComputer Vision. Research Report, Brown University, 1986.

[15] C. Gra�gne. A parallel simulated annealing algorithm. Research report, CNRS, Universit�eParis-Sud, 1984.

[16] F. Heitz, P. Perez, E. Memin, and P. Bouthemy. Parallel Visual Motion Analysis UsingMultiscale Markov Random Fields. In Proc. Workshop on Motion, Princeton, Oct. 1991.

[17] W. D. Hillis. The connection machine. MIT press, 1985.


50 References

[18] F. C. Jeng and J. M. Woods. Compound Gauss - Markov Random Fields for Image Esti-mation. IEEE Trans. Acoust., Speech and Signal Proc., ASSP-39:638{697, 1991.

[19] Z. Kato, M. Berthod, and J. Zerubia. Multiscale markov random �eld models for parallelimage classi�cation. In Proc. ICCV, Berlin, May 1993.

[20] Z. Kato, M. Berthod, and J. Zerubia. Parallel Image Classi�cation using Multiscale MarkovRandom Fields. In Proc. ICASSP, Minneapolis, Apr. 1993.

[21] Z. Kato, J. Zerubia, and M. Berthod. Bayesian image classi�cation using Markov RandomFields. In Proc. Maxent Workshop, Paris, July 1992.

[22] Z. Kato, J. Zerubia, and M. Berthod. Image classi�cation using Markov Random Fieldswith two new relaxation methods: Deterministic pseudo annealing and modi�ed Metropolisdynamics. Research Report 1606, INRIA, Feb. 1992.

[23] Z. Kato, J. Zerubia, and M. Berthod. Satellite image classi�cation using a modi�edMetropolis dynamics. In Proc. ICASSP, San-Francisco, California, USA, Mar. 1992.

[24] P. V. Laarhoven and E. Aarts. Simulated annealing : Theory and applications. Reidel Pub.,Dordrecht, Holland, 1987.

[25] S. Liu-Yu and M. Berthod. E�cient use of contextual information in relaxation labelling.In Proc. ICPR, The Hague, Netherland, Sept. 1992.

[26] F. Marques, J. Cunillera, and A. Gasull. Hierarchical Segmentation Using CompoundGauss-Markov Random Fields. In Proc. ICASSP, San Francisco, Mar. 1992.

[27] N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller. Equation of statecalculations by fast computing machines. J. of Chem. Physics, Vol. 21, pp 1087-1092, 1953.

[28] P. Perez and F. Heitz. Multiscale Markov Random Fields and Constrained Relaxation inLow Level Image Analysis. In Proc. ICASSP, San-Francisco, Mar. 1992.

[29] Y. Rosanov. Markov Random Fields. Springer Verlag, 1982.

[30] J. Zerubia and R. Chellappa. Mean �eld annealing using Compound Gauss-MarkovRandom�elds for edge detection and image estimation. IEEE Trans. on Neural Networks, 8(4), July1993.


Errata of August 1994 51

Errata of August 1994

In Theorem 5.5 (pages 22 and 33), there is a problem: The conditions of the theorem permitsto choose temperatures such that the original optimization criteria changes. The followingexample3 illustrates this problem:

Example 1. Consider the cost function:

U (X) = (X1 � 9)2 + (X1 �X2)2 +X2

2 (xcix)

where X = (X1; X2) is a MRF with cliques fX1g, fX2g and fX1; X2g. The state space is simply f0; : : :9g.Using the Theorem 5.5, these cliques may have the following temperature functions:

T (k; fX1g) =300000�

log k; T (k; fX2g) =

3�

log kand T (k; fX1; X2g) =

3�

logk;

yielding the following energy function:

UMTA(X; k) =�10�6(X1 � 9)2 + (X1 �X2)

2 +X22

� log k3�

which is nothing else but a conventional annealing with the modi�ed energy function:

U 0(X) = 10�6(X1 � 9)2 + (X1 �X2)2 +X2

2 :

However, the minimum of U (X) is X = (6; 3) whereas the minimum of U 0(X) is X = (0; 0).

This mistake is due to the decomposition of U(!) � T (k; C) de�ned in Equation (68) (page21). Let !0 2 opt be a globally optimal con�guration. Thus U(!0) � U inf equals to 0 (U inf

has been de�ned in Equation (74) in page 21). In the case of a classical annealing, dividing bya constant temperature does not change this relation (obviously, 8k : (U(!0) � U inf )=T (k) isstill 0). But it is not necessarily true that (U(!0)� U inf )� T (k; C) is also 0! Because choosingsu�ciently small temperatures for the cliques where !0C is locally not optimal (i.e. strengtheningthe non-optimal cliques) and choosing su�ciently high temperatures for the cliques where !0Cis locally optimal (i.e. weakening the optimal cliques), we obtain (U(!0)� U inf )� T (k; C)> 0,meaning that !0 is no longer globally optimal.

Before modifying the theorem, let us introduce some useful notations. First, let us examinethe decomposition over the cliques of U(!)� U(�) for arbitrary ! and �, ! 6= �:

U(!)� U(�) =XC2C

(VC(!)� VC(�)):

Indeed, there may be negative and positive members in the decomposition. According to thisfact, we get:XC2C

(VC(!)�VC(�)) =X

C2C:(VC(!)�VC(�))<0

(VC(!)� VC(�))

| {z }��(!;�)

+X

C2C:(VC(!)�VC(�))�0

(VC(!)� VC(�))

| {z }�+(!;�)

:

(c)

3We would like to thank to the second reviewer of the paper we have submitted to CVGIP | Graphical Models

and Image Processing for pointing out a lack in the theorem. We present here his counterexample.

Errata of INRIA Research Report No1938, August 1994

52 Errata of August 1994

Now, let us examine � de�ned in Equation (74) (page 21):

� = U sup � U inf

If we want to decompose � as de�ned above, we have to choose some con�guration !0 with amaximum energy (i.e. U(!0) = U sup) and another con�guration !00 with a minimum energy(i.e. U(!00) = U inf ). Obviously, there may be more than one decomposition depending onthe number of globally optimal con�gurations (j opt j) and the number of con�gurations withmaximal global energy (j sup j). Thus, the decomposition of � for a given (!0; !00) is of thefollowing form:

� = ��(!0; !00) + �+(!0; !00)

Furthermore, let us de�ne �+� as:

�+� = min

!02sup;!002opt

�+(!0; !00):

Obviously � � �+�.

Theorem 5.5 is of the following form:

Theorem 5.5 (Multi-Temperature Annealing) Assume that there exists an integer � �N such that for every k = 0; 1; 2; : : :, S � fnk+1; nk+2; : : : ; nk+�g. For all C 2 C, let T (k; C)be any sequence of temperatures decreasing in k for which

1. limk!1 T (k; C) = 0.

Let us denote respectively by T infk and T sup

k the maximum and minimum of the tempera-

ture function at k (8C 2 C:T infk � T (k; C) � T sup

k ).

2. For all k � k0, for some integer k0 � 2: Tinf

k � N�+�= log k.

3. If ��(!; !0) 6= 0 for some ! 2 n opt, !0 2 opt then a further condition must be

imposed:

For all k:Tsup

k�T inf

k

Tinf

k

� R with

R = min! 2 nopt

!0 2 opt

��(!; !0) 6= 0

U(!)� U inf

j ��(!; !0) j


limk!1

P (X(k) = ! j X(0) = �) = �0(!):

Remarks:

1 In practice, we cannot determine R and �+�, as we cannot compute � neither.



2 Considering �+� in condition 2, we have the same problem as in the case of a classical

annealing (the only di�rence is that in a classical annealing, we have � instead of �+�).

Consequently, the same solutions may be used (an exponential schedule with a su�cientlyhigh initial temperature).

3 The factor R is more interesting. We propose herein two possibilities which can be usedfor practical implementations of the method: Either we choose a su�ciently small interval[T inf

0 ; T sup0 ] and suppose that it satis�es the condition 3 (we have used this technique in the

simulations described in the report), or we use a more strict but easily veri�able conditioninstead of condition 3, namely:

limk!1

T supk � T inf

k

T infk

= 0:

4 What happens if ��(!; !0) is zero for all ! and !0 in condition 3 and thus R is not de�ned?This is the best case because it means that all globally optimal con�gurations are also locallyoptimal. That is we have no restriction on the interval [T inf

k ; T supk ], thus any local temperature

schedule satisfying conditions 1{2 is good.

We have done a computer simulation with the counterexample using the modi�ed theorem:

Example 2. Since the energy function in Equation (xcix) is simple, we can compute it for each possiblecon�guration (cf. Table 6). We obtain R = 0:042553, � = 216 and �+

� = 216. Now, let us de�ne atemperature schedule according to the modi�ed theorem:

T (k; fX1g) =1 � 216 �N

logk; T (k; fX2g) =

1:04 � 216 �N

logkand T (k; fX1; X2g) =

1:02 � 216 �N

log k:

The modi�ed energy function is of the following form:

UMTA(X; k) =

�(X1 � 9)2 +

(X1 �X2)2

1:02+

X22

1:04

�logk

216 �N

which is equivalent to the following energy function using classical annealing:

U 0(X) = (X1 � 9)2 +(X1 �X2)

2

1:02+

X22

1:04:

The values of U 0 have also been computed as shown in Table 7. Clearly, we obtain the same minimum as for

the original function U , that is X = (6; 3).

Now, let us go back to the proof of the theorem. The following modi�cations were needed:

1.The lower bound for �(k) given at the beginning of the proof of Lemma A.1 (page 29) ischanged:

�(k) � exp(�U sup � T (k; C))

L exp(�U inf � T (k; C))=

exp(�� T (k; C))

L

as before, but since � � �+� and �+

� > 0:

� 1

Lexp(��+

� � T (k; C))� 1

Lexp(��+

�=Tinfk ):


54 Errata of August 1994

X1nX2 0 1 2 3 4 5 6 7 8 9

0 81 83 89 99 113 131 153 179 209 2431 65 65 69 77 89 105 125 149 177 2092 53 51 53 59 69 83 101 123 149 1793 45 41 41 45 53 65 81 101 125 1534 41 35 33 35 41 51 65 83 105 1315 41 33 29 29 33 41 53 69 89 113

6 45 35 29 27 29 35 45 59 77 997 53 41 33 29 29 33 41 53 69 898 65 51 41 35 33 35 41 51 65 839 81 65 53 45 41 41 45 53 65 81

Table 6: Original energy function U .

X1nX2 0 1 2 3 4 5 6 7 8 9

0 81 82 88 98 112 129 150 176 205 2381 64 64 68 76 88 103 123 146 173 2042 52 50 52 58 68 81 99 120 145 1743 44 40 40 44 52 63 79 98 122 1494 40 34 32 34 40 50 63 80 102 1275 40 32 28 28 32 40 51 67 86 109

6 44 34 28 26 28 34 43 57 74 957 52 40 32 28 28 31 39 51 66 858 63 50 40 34 32 33 39 49 62 799 79 63 51 43 39 39 43 51 62 77

Table 7: Modi�ed energy function U 0.

2.Consequently, the lower bound at the end of Equation (93) (page 29) has the followingform:

L�N exp

0@� N�+

�

T infk0+l�

1A

3.After Equation (96) (page 32), the following paragraph has to be added: Equation (96)is true if (U(!)� U inf )� T (k; C) � 0. Let us rewrite this inequality as

XC2C

VC(!)� VC(!0)

T (k; C)� 0 (ci)

where !0 is any globally optimal con�guration (i.e. !0 2 opt). While VC(!) � VC(!0) may

be negative, U(!) � U inf is always positive or zero (since U inf is the energy minimum). Wedenote by �(!) the energy di�erence in Equation (ci) without the temperature. Obviously, it isnon-negative:

�(!) =XC2C

VC(!)� VC(!0) = U(!)� U inf � 0

Then, let us decompose �(!) according to Equation (c):

�(!) = �+(!; !0) + ��(!; !0):



From which:�+(!; !0) = �(!)� ��(!; !0):

Now, let us consider Equation (ci):

XC2C

VC(!)� VC(!0)

T (k; C)= ��(!; !0)� T (k; C) + �+(!; !0)� T (k; C)

� ��(!; !0)=T infk +�+(!; !0)=T sup

k =��(!; !0) � T sup

k + �+(!; !0) � T infk

T infk T sup

k

� 0

Furthermore:

��(!; !0) � T supk +�+(!; !0) � T inf

k = ��(!; !0) � T supk + (�(!)� ��(!; !0))T inf

k

Therefore:��(!; !0)(T sup

k � T infk )� �(!) � T inf

k � 0

Dividing by ��(!; !0), which is negative, we get:

T supk � T inf

k � �(!)

j ��(!; !0) jTinfk

Which is true due to condition 3 of the theorem.


institut national de recherche en informatique et …

Documents