probabilistic inference lecture 4 – part 1

Probabilistic InferenceLecture 4 – Part 1

M. Pawan Kumarpawan.kumar@ecp.fr

Slides available online http://cvc.centrale-ponts.fr/personnel/pawan/

Va Vb Vc

da db dc

Label l0

Label l1

D : Observed data (image)

V : Unobserved variables

L : Discrete, finite label set

Labeling f : V L

Va is assigned lf(a)

Va Vb Vc

Q(f; ) = ∑a a;f(a) + ∑(a,b)

ab;f(a)f(b)

Label l0

Label l1

bc;f(b)f(c

)a;f(a

) da db dc

Va Vb Vc

Q(f; ) = ∑a a;f(a) + ∑(a,b)

ab;f(a)f(b)

Label l0

Label l1

f* = argminf Q(f; )

da db dc

Va Vb Vc

Q(f; ) = ∑a a;f(a) + ∑(a,b)

ab;f(a)f(b)

Label l0

Label l1

f* = argminf Q(f; )

Outline

• Convex Optimization

• Integer Programming Formulation

• Convex Relaxations

• Comparison

• Generalization of Results

Mathematical Optimization

min g0(x)

s.t. gi(x) ≤ 0

hi(x) = 0

Objective function

Inequality constraints

Equality constraints

x is a feasible point gi(x) ≤ 0, hi(x) = 0

x is a strictly feasible point gi(x) < 0, hi(x) = 0

Feasible region - set of all feasible points

Convex Optimization

min g0(x)

s.t. gi(x) ≤ 0

hi(x) = 0

Objective function

Objective function is convex

Feasible region is convex

Convex set? Convex function?

Convex Set

c x1 + (1 - c) x2

c [0,1]

Line Segment

Endpoints

Convex Set

All points on the line segment lie within the set

For all line segments with endpoints in the set

Non-Convex Set

Examples of Convex Sets

Line Segment

Hyperplane aTx - b = 0

Halfspace aTx - b ≤ 0

Second-order Cone ||x|| ≤ t

Semidefinite Cone {X | X 0}

aTXa ≥ 0, for all a Rn

All eigenvalues of X are non-negative

aTX1a ≥ 0 aTX2a ≥ 0

aT(cX1 + (1-c)X2)a ≥ 0

Operations that Preserve ConvexityIntersection

Polyhedron / Polytope

Operations that Preserve ConvexityIntersection

Operations that Preserve ConvexityAffine Transformation x Ax + b

Convex Function

Blue point always lies above red point

Convex Function

g( c x1 + (1 - c) x2 ) ≤ c g(x1) + (1 - c) g(x2)

Domain of g(.) has to be convex

Convex Function

-g(.) is concave g( c x1 + (1 - c) x2 ) ≤ c g(x1) + (1 - c) g(x2)

Convex FunctionOnce-differentiable functions

g(y) + g(y)T (x - y) ≤ g(x)

(y,g(y))g(y) + g(y)T (x - y)

Twice-differentiable functions 2g(x) 0

Convex Function and Convex Sets

Epigraph of a convex function is a convex set

Examples of Convex Functions

Linear function aTx

p-Norm functions (x1p + x2

p + xnp)1/p, p ≥ 1

Quadratic functions xT Q x

Operations that Preserve ConvexityNon-negative weighted sum

+ w2 + ….

xT Q x + aTx + bQ 0

Operations that Preserve ConvexityPointwise maximum

Pointwise minimum of concave functions is concave

Convex Optimization

min g0(x)

s.t. gi(x) ≤ 0

hi(x) = 0

Objective function

Objective function is convex

Feasible region is convex

Linear Programming

min g0(x)

s.t. gi(x) ≤ 0

hi(x) = 0

Objective function

min g0Tx

s.t. giTx ≤ 0

hiTx = 0

Linear function

Linear constraints

Quadratic Programming

min g0(x)

s.t. gi(x) ≤ 0

hi(x) = 0

Objective function

min xTQx + aTx + b

s.t. giTx ≤ 0

hiTx = 0

Quadratic function

Linear constraints

Second-Order Cone Programming

min g0(x)

s.t. gi(x) ≤ 0

hi(x) = 0

Objective function

min g0Tx

s.t. xTQix + aiTx + bi ≤ 0

hiTx = 0

Linear function

Quadraticconstraints

Linear constraints

Semidefinite Programming

min g0(x)

s.t. gi(x) ≤ 0

hi(x) = 0

Objective function

min Q X

s.t. X 0

Ai X = 0

Linear function

Semidefinite constraints

Linear constraints

Outline

• Comparison

Integer Programming Formulation

0V1 V2

Label ‘0’

Label ‘1’Unary Cost

Unary Cost Vector u = [ 5

Cost of V1 = 0

Cost of V1 = 1

; 2 4 ]

Labelling m = {1 , 0}

0V1 V2

Label ‘0’

Unary Cost Vector u = [ 5 2 ; 2 4 ]T

Label vector x = [ -1

V1 = 1

; 1 -1 ]T

Recall that the aim is to find the optimal x

0V1 V2

Label ‘0’

Unary Cost Vector u = [ 5 2 ; 2 4 ]T

Label vector x = [ -1 1 ; 1 -1 ]T

Sum of Unary Costs = 12

∑i ui (1 + xi)

0V1 V2

Label ‘0’

Label ‘1’Pairwise Cost

0 Cost of V1 = 0 and V1 = 00

0Cost of V1 = 0 and V2 = 0

Cost of V1 = 0 and V2 = 11 0

Pairwise Cost Matrix P

0V1 V2

Label ‘0’

Sum of Pairwise Costs14

∑ij Pij (1 + xi)(1+xj)

0V1 V2

Label ‘0’

Sum of Pairwise Costs14

∑ij Pij (1 + xi +xj + xixj)

∑ij Pij (1 + xi + xj + Xij)=

X = x xT Xij = xi xj

Constraints

• Uniqueness Constraint

∑ xi = 2 - |L|i Va

• Integer Constraints

xi {-1,1}

X = x xT

x* = argmin 12

∑ ui (1 + xi) + 14

∑ Pij (1 + xi + xj + Xij)

∑ xi = 2 - |L|i Va

xi {-1,1}

X = x xT

Convex Non-Convex

Outline• Convex Optimization

• Convex Relaxations– Linear Programming (LP-S)– Semidefinite Programming (SDP-L)– Second Order Cone Programming (SOCP-MS)

• Comparison

x* = argmin 12

∑ ui (1 + xi) + 14

∑ xi = 2 - |L|i Va

xi {-1,1}

X = x xT

Retain Convex PartSchlesinger, 1976

Relax Non-ConvexConstraint

x* = argmin 12

∑ ui (1 + xi) + 14

∑ xi = 2 - |L|i Va

xi [-1,1]

X = x xT

Schlesinger, 1976

Xij [-1,1]

1 + xi + xj + Xij ≥ 0

∑ Xij = (2 - |L|) xij Vb

x* = argmin 12

∑ ui (1 + xi) + 14

∑ xi = 2 - |L|i Va

xi [-1,1]

X = x xT

x* = argmin 12

∑ ui (1 + xi) + 14

∑ xi = 2 - |L|i Va

xi [-1,1],

Xij [-1,1]

1 + xi + xj + Xij ≥ 0

∑ Xij = (2 - |L|) xij Vb

Outline• Convex Optimization

• Comparison

x* = argmin 12

∑ ui (1 + xi) + 14

∑ xi = 2 - |L|i Va

xi {-1,1}

X = x xT

Retain Convex PartLasserre, 2000

x* = argmin 12

∑ ui (1 + xi) + 14

∑ xi = 2 - |L|i Va

xi [-1,1]

X = x xT

Retain Convex Part

Lasserre, 2000

1 x1 x2 ... xn

Rank = 1

Xii = 1

Positive SemidefiniteConvex

Non-Convex

1 x1 x2 ... xn

Xii = 1

Positive SemidefiniteConvex

Schur’s Complement

BTA-1 I

0 C - BTA-1B

I A-1B

A 0 C -BTA-1B 0

X - xxT 0

0 X - xxT

Schur’s Complement

x* = argmin 12

∑ ui (1 + xi) + 14

∑ xi = 2 - |L|i Va

xi [-1,1]

X = x xT

Retain Convex Part

Lasserre, 2000

x* = argmin 12

∑ ui (1 + xi) + 14

∑ xi = 2 - |L|i Va

xi [-1,1]

Retain Convex Part

Xii = 1 X - xxT 0

Accurate

Inefficient

Lasserre, 2000

Outline

• Comparison

SOCP Relaxation

x* = argmin 12

∑ ui (1 + xi) + 14

∑ xi = 2 - |L|i Va

xi [-1,1]

Xii = 1 X - xxT 0

Derive SOCP relaxation from the SDP relaxation

Further Relaxation

1-D ExampleX - xxT 0

X - x2 ≥ 0

For two semidefinite matrices, Frobenius inner product is non-negative

SOC of the form || v ||2 st

2-D Example

X11 X12

X21 X22

x1x1 x1x2

x2x1 x2x2

xxT =x1

2 x1x2

2-D Example(X - xxT)

1 - x12 X12-x1x2

0 0 X12-x1x2 1 - x22

-1 x1 1

C1 0 C1 0

1 - x12 X12-x1x2

0 1 X12-x1x2 1 - x22

C2 0 C2 0

-1 x2 1

1 - x12 X12-x1x2

1 1 X12-x1x2 1 - x22

C3 0 C3 0

(x1 + x2)2 2 + 2X12

1 - x12 X12-x1x2

-1 1 X12-x1x2 1 - x22

C4 0 C4 0

(x1 - x2)2 2 - 2X12

SOCP Relaxation

Consider a matrix C1 = UUT 0

(X - xxT)

||UTx ||2 X C1

Continue for C2, C3, … , Cn

Kim and Kojima, 2000

SOCP RelaxationHow many constraints for SOCP = SDP ?

Infinite. For all C 0

Specify constraints similar to the 2-D example

(xi + xj)2 2 + 2Xij

(xi + xj)2 2 - 2Xij

SOCP-MS

x* = argmin 12

∑ ui (1 + xi) + 14

∑ xi = 2 - |L|i Va

xi [-1,1]

Xii = 1 X - xxT 0

Muramatsu and Suzuki, 2003

SOCP-MS

x* = argmin 12

∑ ui (1 + xi) + 14

∑ xi = 2 - |L|i Va

xi [-1,1]

(xi + xj)2 2 + 2Xij (xi - xj)2 2 - 2Xij

Specified only when Pij 0

Outline

• Comparison

Kumar, Kolmogorov and Torr, JMLR 2010

Dominating Relaxation

For all MAP Estimation problem (u, P)

A dominates B

Dominating relaxations are better

Equivalent Relaxations

A dominates B

B dominates A

For all MAP Estimation problem (u, P)

Strictly Dominating Relaxation

A dominates B

B does not dominate A

For at least one MAP Estimation problem (u, P)

SOCP-MS

(xi + xj)2 2 + 2Xij (xi - xj)2 2 - 2Xij

• Pij ≥ 0(xi + xj)2

2- 1Xij =

• Pij < 0(xi - xj)2

21 -Xij =

SOCP-MS is a QP

Same as QP by Ravikumar and Lafferty, 2005

SOCP-MS ≡ QP-RL

LP-S vs. SOCP-MSDiffer in the way they relax X = xxT

Xij [-1,1]

1 + xi + xj + Xij ≥ 0

∑ Xij = (2 - |L|) xij Vb

(xi + xj)2 2 + 2Xij

(xi - xj)2 2 - 2Xij

SOCP-MS

F(LP-S)

F(SOCP-MS)

LP-S vs. SOCP-MS

• LP-S strictly dominates SOCP-MS

• LP-S strictly dominates QP-RL

• Where have we gone wrong?

• A Quick Recap !

Recap of SOCP-MS

1 1C =

(xi + xj)2 2 + 2Xij

Recap of SOCP-MS

-1 1C =

(xi - xj)2 2 - 2Xij

Can we use different C matrices ??

Can we use a different subgraph ??

Outline

• Comparison

• Generalization of Results– SOCP Relaxations on Trees– SOCP Relaxations on Cycles

SOCP Relaxations on Trees

Choose any arbitrary tree

SOCP Relaxations on Trees

Choose any arbitrary C 0

Repeat over trees to get relaxation SOCP-T

LP-S strictly dominates SOCP-T

LP-S strictly dominates QP-T

Outline

• Comparison

• Generalization of Results– SOCP Relaxations on Trees– SOCP Relaxations on Cycles

SOCP Relaxations on Cycles

Choose an arbitrary even cycle

Pij ≥ 0 Pij ≤ 0OR

Choose any arbitrary C 0

Repeat over even cycles to get relaxation SOCP-E

LP-S strictly dominates SOCP-E

LP-S strictly dominates QP-E

• True for odd cycles with Pij ≤ 0

• True for odd cycles with Pij ≤ 0 for only one edge

• True for odd cycles with Pij ≥ 0 for only one edge

• True for all combinations of above cases

The SOCP-C Relaxation Include all LP-S constraints True SOCP

Submodular Non-submodular Submodular

The SOCP-C Relaxation

Include all LP-S constraints True SOCP0

Frustrated Cycle

LP-S Solution

0-1 -1

1 -1 1

Objective Function = 0

LP-S Solution

0-1 -1

1 -1 1

Define an SOC Constraint using C = 1

LP-S Solution

0-1 -1

1 -1 1

(xi + xj + xk)2 3 + 2 (Xij + Xjk + Xki)

SOCP-C Solution

Objective Function = 0.75

SOCP-C strictly dominates LP-S

0.6-0.8 -0.8

0.80.6

-0.60.4

-0.4 -0.4

0.3 0.3

The SOCP-Q Relaxation Include all cycle inequalities True SOCP

Clique of size n

Define an SOCP Constraint using C = 1

(Σ xi)2 ≤ n + (Σ Xij)

SOCP-Q strictly dominates LP-S

SOCP-Q strictly dominates SOCP-C

4-Neighbourhood MRF

Test SOCP-C

50 binary MRFs of size 30x30

u ≈ N (0,1)

P ≈ N (0,σ2)

4-Neighbourhood MRF

σ = 2.5

8-Neighbourhood MRF

Test SOCP-Q

50 binary MRFs of size 30x30

u ≈ N (0,1)

P ≈ N (0,σ2)

8-Neighbourhood MRF

σ = 1.125

Conclusions

• Large class of SOCP/QP dominated by LP-S

• New SOCP relaxations dominate LP-S

• But better LP relaxations exist

probabilistic inference lecture 4 – part 1

Documents

unit 6: simple linear regression lecture 2: outliers and...

bayesian inference - wellcome trust centre for …€¢ some...

automatisierte logik und programmierung ii · gui evaluator...

nmrbox.org nmrbox: trd 3 a probabilistic core as a coherent...

workshop on probabilistic models with determinantal...

variational inference via upper bound...

lecture 13a: probabilistic method - hacettepe

refined probabilistic abstraction

probabilistic inference in human semantic memory mark ...

causal inference

harnessing probabilistic programming for network...

probabilistic inference lecture 7

15 jvariational inference - ssl2.cms.fu-berlin.de ·...

lecture 3: inference for multinomial parameters

negnevitsky, pearson education, 2002 1 lecture 5 fuzzy...

lecture 8: inference on parameters - cmu...

machine learning probabilistic machine learning · machine...

probabilistic programming and optimizationarto klami...

probabilistic models

lecture 5 fuzzy expert...