efficient linear programming for dense crfs · lp minimization current method i projected...

30
Efficient Linear Programming for Dense CRFs Thalaiyasingam Ajanthan Alban Desmaison Rudy Bunel Mathieu Salzmann Philip H. S. Torr M. Pawan Kumar Oxford, December 2016

Upload: others

Post on 27-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Efficient Linear Programming for Dense CRFs

Thalaiyasingam Ajanthan Alban DesmaisonRudy Bunel Mathieu Salzmann Philip H. S. Torr

M. Pawan Kumar

Oxford, December 2016

Page 2: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Dense CRF

I Fully connected CRF with Gaussian pairwise potentials.

E (x) =

n∑a=1

φa(xa) +

n∑a=1

n∑b=1b 6=a

ψab(xa , xb) ,

ψab(xa , xb) = µ(xa , xb)︸ ︷︷ ︸Label compatibility

exp

(−‖fa − fb‖2

2

)︸ ︷︷ ︸

Pixel compatibility

,

where xa ∈ L and fa ∈ IRd.

Why?

I Captures long-range interactions and provides fine grainedsegmentations [Krahenbuhl-2011].

Page 3: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Dense CRF

I Fully connected CRF with Gaussian pairwise potentials.

E (x) =

n∑a=1

φa(xa) +

n∑a=1

n∑b=1b 6=a

ψab(xa , xb) ,

ψab(xa , xb) = µ(xa , xb)︸ ︷︷ ︸Label compatibility

exp

(−‖fa − fb‖2

2

)︸ ︷︷ ︸

Pixel compatibility

,

where xa ∈ L and fa ∈ IRd.

Why?

I Captures long-range interactions and provides fine grainedsegmentations [Krahenbuhl-2011].

Page 4: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Dense CRF

E (x) =

n∑a=1

φa(xa) +

n∑a=1

n∑b=1b 6=a

ψab(xa , xb) ,

ψab(xa , xb) = µ(xa , xb) exp

(−‖fa − fb‖2

2

).

Difficulty

I Requires O(n2) computations ⇒ Infeasible.

Idea

I Approximate using the filtering method [Adams-2010]⇒ O(n) computations.

Page 5: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Dense CRF

E (x) =

n∑a=1

φa(xa) +

n∑a=1

n∑b=1b 6=a

ψab(xa , xb) ,

ψab(xa , xb) = µ(xa , xb) exp

(−‖fa − fb‖2

2

).

Difficulty

I Requires O(n2) computations ⇒ Infeasible.

Idea

I Approximate using the filtering method [Adams-2010]⇒ O(n) computations.

Page 6: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Current algorithms for MAP inference in dense CRFs

I Rely on the efficient filtering method [Adams-2010].

AlgorithmTime complexityper iteration

Theoreticalbound

Mean Field (MF) [1] O(n) NoQuadratic Programming (QP) [2] O(n) YesDifference of Convex (DC) [2] O(n) YesLinear Programming (LP) [2] O(n log(n)) Yes (best)

Contribution

I LP in O(n) time per iteration⇒ An order of magnitude speedup.

[1] [Krahenbuhl-2011]

[2] [Desmaison-2016]

Page 7: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Current algorithms for MAP inference in dense CRFs

I Rely on the efficient filtering method [Adams-2010].

AlgorithmTime complexityper iteration

Theoreticalbound

Mean Field (MF) [1] O(n) NoQuadratic Programming (QP) [2] O(n) YesDifference of Convex (DC) [2] O(n) YesLinear Programming (LP) [2] O(n log(n)) Yes (best)

Contribution

I LP in O(n) time per iteration⇒ An order of magnitude speedup.

[1] [Krahenbuhl-2011]

[2] [Desmaison-2016]

Page 8: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

LP minimization

Current method

I Projected subgradient descent ⇒ Too slow.I Linearithmic time per iteration.I Expensive line search.I Require large number of iterations.

Our algorithm

I Proximal minimization using block coordinate descent.I One block: Significantly smaller subproblems.I The other block: Efficient conditional gradient descent.

I Linear time conditional gradient computation.I Optimal step size.

I Guarantees optimality and converges faster.

Page 9: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

LP minimization

Current method

I Projected subgradient descent ⇒ Too slow.I Linearithmic time per iteration.I Expensive line search.I Require large number of iterations.

Our algorithm

I Proximal minimization using block coordinate descent.I One block: Significantly smaller subproblems.I The other block: Efficient conditional gradient descent.

I Linear time conditional gradient computation.I Optimal step size.

I Guarantees optimality and converges faster.

Page 10: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

LP relaxation of a dense CRF

miny

E (y) =∑a

∑i

φa:i ya:i +∑a,b 6=a

∑i

Kab|ya:i − yb:i |

2,

s.t. y ∈M =

{y

∑i ya:i = 1, a ∈ {1 . . .n}

ya:i ∈ [0, 1], a ∈ {1 . . .n}, i ∈ L

},

where Kab = exp(−‖fa−fb‖2

2

).

Assumption: Label compatibility is the Potts model.

Standard solvers would require O(n2) variables.

Page 11: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

LP relaxation of a dense CRF

miny

E (y) =∑a

∑i

φa:i ya:i +∑a,b 6=a

∑i

Kab|ya:i − yb:i |

2,

s.t. y ∈M =

{y

∑i ya:i = 1, a ∈ {1 . . .n}

ya:i ∈ [0, 1], a ∈ {1 . . .n}, i ∈ L

},

where Kab = exp(−‖fa−fb‖2

2

).

Assumption: Label compatibility is the Potts model.

Standard solvers would require O(n2) variables.

Page 12: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Proximal minimization of LP

miny

E (y) +1

∥∥∥y − yk∥∥∥2 ,

s.t. y ∈M ,

where yk is the current estimate.

Why?

I Initialization using MF or DC.

I Smooth dual ⇒ Sophisticated optimization.

Approach

I Block coordinate descent tailored to this problem.

Page 13: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Proximal minimization of LP

miny

E (y) +1

∥∥∥y − yk∥∥∥2 ,

s.t. y ∈M ,

where yk is the current estimate.

Why?

I Initialization using MF or DC.

I Smooth dual ⇒ Sophisticated optimization.

Approach

I Block coordinate descent tailored to this problem.

Page 14: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Proximal minimization of LP

miny

E (y) +1

∥∥∥y − yk∥∥∥2 ,

s.t. y ∈M ,

where yk is the current estimate.

Why?

I Initialization using MF or DC.

I Smooth dual ⇒ Sophisticated optimization.

Approach

I Block coordinate descent tailored to this problem.

Page 15: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Dual of the proximal problem

minα,β,γ

g(α,β,γ) =λ

2‖Aα + Bβ + γ − φ‖2

+⟨

Aα + Bβ + γ − φ,yk⟩− 〈1,β〉 ,

s.t. γa:i ≥ 0 ∀ a ∈ {1 . . .n} ∀ i ∈ L ,

α ∈ C =

α1ab:i + α2

ab:i = Kab2 , a 6= b, i ∈ L

α1ab:i , α

2ab:i ≥ 0, a 6= b, i ∈ L

}.

Block coordinate descent

I α ∈ C: Compact domain ⇒ Conditional gradient descent.

I β: Unconstrained ⇒ Set derivative to zero.

I γ: Unbounded and separable ⇒ Small QP for each pixel.

Guarantees optimality since g is strictly convex and smooth.

Page 16: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Dual of the proximal problem

minα,β,γ

g(α,β,γ) =λ

2‖Aα + Bβ + γ − φ‖2

+⟨

Aα + Bβ + γ − φ,yk⟩− 〈1,β〉 ,

s.t. γa:i ≥ 0 ∀ a ∈ {1 . . .n} ∀ i ∈ L ,

α ∈ C =

α1ab:i + α2

ab:i = Kab2 , a 6= b, i ∈ L

α1ab:i , α

2ab:i ≥ 0, a 6= b, i ∈ L

}.

Block coordinate descent

I α ∈ C: Compact domain ⇒ Conditional gradient descent.

I β: Unconstrained ⇒ Set derivative to zero.

I γ: Unbounded and separable ⇒ Small QP for each pixel.

Guarantees optimality since g is strictly convex and smooth.

Page 17: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Dual of the proximal problem

minα,β,γ

g(α,β,γ) =λ

2‖Aα + Bβ + γ − φ‖2

+⟨

Aα + Bβ + γ − φ,yk⟩− 〈1,β〉 ,

s.t. γa:i ≥ 0 ∀ a ∈ {1 . . .n} ∀ i ∈ L ,

α ∈ C =

α1ab:i + α2

ab:i = Kab2 , a 6= b, i ∈ L

α1ab:i , α

2ab:i ≥ 0, a 6= b, i ∈ L

}.

Block coordinate descent

I α ∈ C: Compact domain ⇒ Conditional gradient descent.

I β: Unconstrained ⇒ Set derivative to zero.

I γ: Unbounded and separable ⇒ Small QP for each pixel.

Guarantees optimality since g is strictly convex and smooth.

Page 18: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Conditional gradient computation

∀ a ∈ {1 . . .n}, v ′a =∑b

Kab vb 1[ya ≥ yb ] ,

where ya , yb ∈ [0, 1].

Difficulty

I The permutohedral lattice based filtering methodof [Adams-2010] cannot handle the ordering constraint.

Current method [Desmaison-2016]

I Repeated application of the original filtering method using adivide and conquer strategy ⇒ O(d2n log(n)) computations1.

Our ideaI Discretize the interval [0, 1] to H levels and instantiate H

permutohedral lattices ⇒ O(Hdn) computations2.

1d - filter dimension, usually 2 or 5.2H = 10 in our case.

Page 19: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Conditional gradient computation

∀ a ∈ {1 . . .n}, v ′a =∑b

Kab vb 1[ya ≥ yb ] ,

where ya , yb ∈ [0, 1].

Difficulty

I The permutohedral lattice based filtering methodof [Adams-2010] cannot handle the ordering constraint.

Current method [Desmaison-2016]

I Repeated application of the original filtering method using adivide and conquer strategy ⇒ O(d2n log(n)) computations1.

Our ideaI Discretize the interval [0, 1] to H levels and instantiate H

permutohedral lattices ⇒ O(Hdn) computations2.

1d - filter dimension, usually 2 or 5.2H = 10 in our case.

Page 20: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Conditional gradient computation

∀ a ∈ {1 . . .n}, v ′a =∑b

Kab vb 1[ya ≥ yb ] ,

where ya , yb ∈ [0, 1].

Difficulty

I The permutohedral lattice based filtering methodof [Adams-2010] cannot handle the ordering constraint.

Current method [Desmaison-2016]

I Repeated application of the original filtering method using adivide and conquer strategy ⇒ O(d2n log(n)) computations1.

Our ideaI Discretize the interval [0, 1] to H levels and instantiate H

permutohedral lattices ⇒ O(Hdn) computations2.

1d - filter dimension, usually 2 or 5.2H = 10 in our case.

Page 21: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Conditional gradient computation

∀ a ∈ {1 . . .n}, v ′a =∑b

Kab vb 1[ya ≥ yb ] ,

where ya , yb ∈ [0, 1].

Difficulty

I The permutohedral lattice based filtering methodof [Adams-2010] cannot handle the ordering constraint.

Current method [Desmaison-2016]

I Repeated application of the original filtering method using adivide and conquer strategy ⇒ O(d2n log(n)) computations1.

Our ideaI Discretize the interval [0, 1] to H levels and instantiate H

permutohedral lattices ⇒ O(Hdn) computations2.

1d - filter dimension, usually 2 or 5.2H = 10 in our case.

Page 22: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Segmentation results

0 5 10 15 20 25 30 35

Time (s)

1

2

3

4

5

6

7

8

9

10

Integral

energy

×106

MFDCnegSG-LPℓ

PROX-LPacc

0 20 40 60 80 100 120

Time (s)

5

5.2

5.4

5.6

5.8

6

6.2

6.4

6.6

6.8

7

Integral

energy

×105

MFDCnegSG-LPℓ

PROX-LPacc

Assignment energy as a function of time for an image in (left) MSRCand (right) Pascal.

I Both the LP minimization algorithms are initialized withDCneg.

Page 23: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Segmentation results

Ave. E (×103) Ave. T (s) Acc.M

SR

C

MF5 8078.0 0.2 79.33MF 8062.4 0.5 79.35DCneg 3539.6 1.3 83.01SG-LP` 3335.6 13.6 83.15PROX-LPacc 1340.0 3.7 84.16

Pas

cal

MF5 1220.8 0.8 79.13MF 1220.8 0.7 79.13DCneg 629.5 3.7 80.43SG-LP` 617.1 84.4 80.49PROX-LPacc 507.7 14.7 80.58

Results on the MSRC and Pascal datasets.

Page 24: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Segmentation results

Image MF DCneg SG-LP` PROX-LPaccGround truth

Qualitative results on MSRC.

Page 25: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Modified filtering algorithm

0.5 1 1.5 2

Number of pixels ×105

10

20

30

40

50

60

Speedup

0 5 10 15 20

Number of labels

10

20

30

40

50

60

Spatial kernel (d = 2)

0.5 1 1.5 2

Number of pixels ×105

10

20

30

40

50

Speedup

0 5 10 15 20

Number of labels

10

20

30

40

50

Bilateral kernel (d = 5)

Speedup of our modified filtering algorithm over the divide andconquer strategy on a Pascal image.

Speedup is around 45− 65 on the standard image.

Page 26: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Modified filtering algorithm

0.5 1 1.5 2

Number of pixels ×105

10

20

30

40

50

60

Speedup

0 5 10 15 20

Number of labels

10

20

30

40

50

60

Spatial kernel (d = 2)

0.5 1 1.5 2

Number of pixels ×105

10

20

30

40

50

Speedup

0 5 10 15 20

Number of labels

10

20

30

40

50

Bilateral kernel (d = 5)

Speedup of our modified filtering algorithm over the divide andconquer strategy on a Pascal image.

Speedup is around 45− 65 on the standard image.

Page 27: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Conclusion

I We have introduced the first LP minimization algorithmfor dense CRFs whose iterations are linear in the numberof pixels and labels.

Arxiv: https://arxiv.org/abs/1611.09718

Page 28: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Thank you!

Page 29: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Permutohedral lattice

A 2-dimensional hyperplane tessellated by the permutohedral lattice.

Page 30: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search

Modified filtering algorithm

Splat Blur Slice

Top row: Original filtering method. Bottom row: Our modifiedfiltering method. H = 3.