sparse representation and compressed sensing: theory and algorithms yi ma 1,2 allen yang 3 john...

49
Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 200 1 Microsoft Research Asia 3 University of California Berkeley 2 University of Illinois at Urbana- Champaign

Post on 15-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

Sparse Representation and Compressed Sensing:Theory and Algorithms

Yi Ma1,2 Allen Yang3 John Wright1

CVPR Tutorial, June 20, 2009

1Microsoft Research Asia

3University of California Berkeley

2University of Illinois at Urbana-Champaign

Page 2: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

MOTIVATION – Applications to a variety of vision problems

• Face Recognition:

Wright et al PAMI ’09, Huang CVPR ’08, Wagner CVPR ’09 …

• Image Enhancement and Superresolution:

Elad TIP ’06, Huang CVPR ‘08, …

• Image Classification:

Mairal CVPR ‘08, Rodriguez ‘07, many others …

• Multiple Motion Segmentation:

Rao CVPR ‘08, Elfamhir CVPR ’09 …

• … and many others, including this conference

Page 3: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

MOTIVATION – Applications to a variety of vision problems

• Face Recognition:

Wright et al PAMI ’09, Huang CVPR ’09, Wagner CVPR ’09 …

• Image Enhancement and Superresolution:

Elad TIP ’06, Huang CVPR ‘08, …

• Image Classification:

Mairal CVPR ‘08, Rodriguez ‘07, …

• Multiple Motion Segmentation:

Rao CVPR ‘08, Elfamhir CVPR ’09 …

• … and many others, including this conference

When and why can we expect such good performance?

A closer look at the theory …

Page 4: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

SPARSE REPRESENTATION – Model problem

y = Ax

=

y 2 RmA 2 Rm£ n ; m ¿ nObservation Unknown

?

?

?

?

x 2 Rn

Underdetermined system of linear equations,

Two interpretations:

• Compressed sensing: A as sensing matrix

• Sparse representation: A as overcomplete dictionary

Page 5: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

SPARSE REPRESENTATION – Model problem

y = Ax

=

A 2 Rm£ n ; m ¿ nObservation Unknown

?

?

?

?

x 2 Rn

Underdetermined system of linear equations,

Many more unknowns than observations → no unique solution.

• Classical answer: minimum -norm solution

• Emerging applications: instead desire sparse solutions

y 2 Rm

Page 6: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

SPARSE SOLUTIONS – Uniqueness

minkxk0 subj y = Ax:

Look for the sparsest solution:

k¢k0 - number of nonzero elements

Page 7: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

SPARSE SOLUTIONS – Uniqueness

minkxk0 subj y = Ax:

Is the sparsest solution unique?

Look for the sparsest solution:

k¢k0 - number of nonzero elements

spark(A) - size of smallest set of linearly dependent columns of A.

= =x1 x2

A1 A2

y ) A1x1 ¡ A2x2 =0:

Page 8: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

SPARSE SOLUTIONS – Uniqueness

minkxk0 subj y = Ax:

Is the sparsest solution unique?

Look for the sparsest solution:

k¢k0 - number of nonzero elements

spark(A) - size of smallest set of linearly dependent columns of A.

Proposition [Gorodnitsky & Rao ‘97]:

If with ,

then is the unique solution to

y = Ax0 kx0k0 <spark(A )

2

x0 minkxk0 subj y = Ax:

Page 9: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

SPARSE SOLUTIONS – So How Do We Compute It?

(P0) minkxk0 subj y = Ax:

Looking for the sparsest solution:

Bad News: NP-hard in the worst case, hard to approximate within certain constants [Amaldi & Kann ’95].

(P0)

Page 10: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

SPARSE SOLUTIONS – So How Do We Compute It?

(P0) minkxk0 subj y = Ax:

Looking for the sparsest solution:

Maybe we can still solve important cases?

• Greedy algorithms:

Matching Pursuit, Orthogonal Matching Pursuit [Mallat & Zhang ‘93]CoSAMP [Needell & Tropp ‘08]

• Convex programming [Chen, Donoho & Saunders ‘94]

Bad News: NP-hard in the worst case, hard to approximate within certain constants [Amaldi & Kann ’95].

(P0)

Page 11: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

SPARSE SOLUTIONS – The Heuristic

(P0) minkxk0 subj y = Ax:

Looking for the sparsest solution:

(P1) minkxk1 subj y = Ax:

convex relaxation

Linear program,solvable in polynomial time.

Intractable.

Why ? Convex envelope of over the unit cube:1̀ 0̀

Rich applied history – geosciences, sparse coding in vision, statistics

Page 12: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

EQUIVALENCE – A stronger motivation

Theorem [Candes & Tao ’04, Donoho ‘04]:

For Gaussian , with overwhelming probability, wheneverA

x0 = argminkxk1 subj Ax = Ax0:

kx0k0 <½?m

“ -minimization recovers any sufficiently sparse solution”

In many cases, the solutions to (P0) and (P1) are exactly the same:

Page 13: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

Mutual coherence: largest inner product between distinct columns of A

GUARANTEES – “Well-Spread” A

Low mutual coherence: vectors are well-spread in the space

Page 14: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

Mutual coherence:

Theorem [Elad & Donoho ’03, Gribvonel & Nielsen ‘03]:

x0minimization uniquely recovers any with .

Strong point: checkable condition.

Weakness:

low coherence can only guarantee recovery up to nonzeros.

GUARANTEES – “Well-Spread” A

Page 15: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

Restricted Isometry Constants:

-sparse , s.t. for all

GUARANTEES – Beyond Coherence

Low coherence: “any submatrix consisting of two columns of A is well-conditioned”

Stronger bounds by looking at larger submatrices?

“Column submatrices of A are uniformly well-conditioned”Low RIC:

Page 16: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

Restricted Isometry Constants:

-sparse , s.t. for all

Theorem [Candes & Tao ’04, Candes ‘07]:

x0If , then -minimization recovers any k-sparse . 1̀

For random A, this guarantees recovery up to linear sparsity:

GUARANTEES – Beyond Coherence

kx0k0 <½?m

Page 17: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

Necessary and sufficient condition:

GUARANTEES – Sharp Conditions?

solves

iff

polytope spanned by columns of A and their negatives

Page 18: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

Necessary and sufficient condition:

uniquely recovers with support and signs iff is a simplicial face of .

Uniform guarantees for -sparse P centrally -neighborly.

[Donoho + Tanner ’08]

[Donoho ’06]

GUARANTEES – Geometric Interpretation

Page 19: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

Geometric understanding gives sharp thresholds for sparse recovery with Gaussian A [Donoho & Tanner ‘08]:

Aspect ratio of A

Sparsity

Weak threshold

Strong threshold

GUARANTEES – Geometric Interpretation

Failure almost always

Success almost always

Success always

Page 20: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

Explicit formulas in the wide-matrix limit [Donoho & Tanner ‘08]:

GUARANTEES – Geometric Interpretation

Weak threshold:

Strong threshold:

Page 21: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

What if there is noise in the observation?

GUARANTEES – Noisy Measurements

y = Ax +z:

Natural approach: relax the constraint:

minkxk1 subj ky ¡ Axk22 · "2

Studied in several literatures

Statistics – LASSO Signal processing – BPDN.

Gaussian or bounded 2-norm

Page 22: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

GUARANTEES – Noisy Measurements

Natural approach: minkxk1 subj ky ¡ Axk22 · "2

Theorem [Donoho, Elad & Temlyakov ‘06]: Recovery is stable: kx̂ ¡ x0k2 ·

4kzk221¡ ¹ (A )(4kx0k0¡ 1)

See also [Candes-Romberg-Tao ‘06], [Wainwright ‘06], [Meinshausen & Yu ’06], [Zhao & Yu ‘06], …

What if there is noise in the observation? y = Ax +z:

Page 23: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

GUARANTEES – Noisy Measurements

Theorem [Candes-Romberg-Tao ‘06]: Recovery is stable – for A satisfying an appropriate condition,

kx̂ ¡ x0k2 · C1kzk2+C2kx0¡ x0;S k1p

S

RIP4S

See also [Donoho ‘06], [Wainwright ‘06], [Meinshausen & Yu ’06], [Zhao & Yu ‘06], …

Natural approach: minkxk1 subj ky ¡ Axk22 · "2

What if there is noise in the observation? y = Ax +z:

x0;S – best S-term approximation

Page 24: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

Similar sparse recovery problems explored in data streaming community:

Combinatorial algorithms → fast encoding/decoding at expense of suboptimal # of measurements

Based on ideas from group testing, expander graphs

CONNECTIONS – Sketching and Expanders

y 2 Rm

A 2 Rm£ n ; m ¿ n

Data stream

020

50001

x 2 RnSketch =

[Gilbert et al ‘06], [Indyk ‘08], [Xu & Hassibi ‘08]

Page 25: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

Sparse recovery guarantees can also be derived via probabilistic constructions from high-dimensional geometry:

• The Johnson-Lindenstrauss lemma

• Dvoretsky’s almost-spherical section theorem:

There exist subspaces of dimension as high ason which the and norms are comparable:

CONNECTIONS – High dimensional geometry

¡ ½Rm c¢m1̀ 2̀

8x 2 ¡ ; Cpmkxk2 · kxk1 ·

pmkxk2

Given n points a random projection into dimensions preserves pairwise distances:

x1 : : :xn ½RmC log(m)"2

(1¡ ")kxi ¡ xj k2 · kP xi ¡ P xj k · kxi ¡ xj k2:

Page 26: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

Sparse solutions can often be recovered by linear programming.

Performance guarantees for arbitrary matrices with “uniformly well-spread columns”:

• (in)-coherence• Restricted Isometry

Sharp conditions via polytope geometry

Very well-understood performance for random matrices

What about matrices arising in vision… ?

THE STORY SO FAR – Sparse recovery guarantees

Page 27: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

If test image is also of subject , then

Linear subspace model for images of same face under varying illumination:

Subject i Training

for some . .

Can represent any test image wrt the entire training set as

coefficients corruption, occlusion

Combined training dictionary

Test image

PRIOR WORK - Face Recognition as Sparse Representation

Page 28: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

Underdetermined system of linear equations in unknowns :

Wright, Yang, Ganesh, Sastry, and Ma. Robust Face Recognition via Sparse Representation, PAMI 2008

Seek the sparsest solution:

Solution is not unique … but

should be sparse: ideally, only supported on images of the same subject expected to be sparse: occlusion only affects a subset of the pixels

convex relaxation

PRIOR WORK - Face Recognition as Sparse Representation

Page 29: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

GUARANTEES – What About Vision Problems?

99.3%90.7%

37.5%

Behavior under varying levels of random pixel corruption:

Can existing theory explain this phenomenon?

Recognition rate

Page 30: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

• Apply parity check matrix s.t. , yielding

• Set • Recover from clean system

PRIOR WORK - Error Correction by minimization

Underdetermined system in sparse e only

Candes and Tao [IT ‘05]:

Succeeds whenever in the reduced system .

Page 31: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

• Apply parity check matrix s.t. , yielding

• Set • Recover from clean system

Succeeds whenever in the reduced system .

PRIOR WORK - Error Correction by minimization

Underdetermined system in sparse e only

This work:• Instead solve

Candes and Tao [IT ‘05]:

Can be applied when A is wide (no parity check).

Page 32: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

• Apply parity check matrix s.t. , yielding

• Set • Recover from clean system

PRIOR WORK - Error Correction by minimization

Underdetermined system in sparse e only

Succeeds whenever in the expanded system .

This work:• Instead solve

Candes and Tao [IT ‘05]:

Succeeds whenever in the reduced system .

Page 33: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

GUARANTEES – What About Vision Problems?

Results so far: should not succeed.

very sparse: # images per subject,

often nonnegative (illumination cone models).

as dense as possible: robust to highest possible corruption.

Highly coherent

( volume )

Page 34: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

As dimension , an even more striking phenomenon emerges:

SIMULATION - Dense Error Correction?

Page 35: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

SIMULATION - Dense Error Correction?

As dimension , an even more striking phenomenon emerges:

Page 36: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

SIMULATION - Dense Error Correction?

As dimension , an even more striking phenomenon emerges:

Page 37: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

SIMULATION - Dense Error Correction?

As dimension , an even more striking phenomenon emerges:

Page 38: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

SIMULATION - Dense Error Correction?

As dimension , an even more striking phenomenon emerges:

Page 39: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

Conjecture: If the matrices are sufficiently coherent, then for any error fraction , as , solving

corrects almost any error with .

SIMULATION - Dense Error Correction?

As dimension , an even more striking phenomenon emerges:

Page 40: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

DATA MODEL - Cross-and-Bouquet

Our model for should capture the fact that the columns are tightly clustered around a common mean :

We call this the “Cross-and-Bouquet’’ (CAB) model.

Mean is mostly incoherent with standard (error) basis

L^-norm of deviations well-controlled ( -> v )

Page 41: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

ASYMPTOTIC SETTING - Weak Proportional Growth

• Observation dimension

• Problem size grows proportionally:

• Error support grows proportionally:

• Support size sublinear in :

Page 42: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

Sublinear growth of is necessary to correct arbitrary fractions of errors:

Need at least “clean” equations.

ASYMPTOTIC SETTING - Weak Proportional Growth

Empirical Observation:

If grows linearly in , sharp phase transition at .

• Observation dimension

• Problem size grows proportionally:

• Error support grows proportionally:

• Support size sublinear in :

Page 43: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

Whether is recovered depends only on

Call -recoverable if with these signs and support

and the minimizer is unique.

NOTATION - Correct Recovery of Solutions

Page 44: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

“ recovers any sparse signal from almost any error with density less than 1”

Recall notation:

MAIN RESULT - Correction of Arbitrary Error Fractions

Page 45: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

Fraction of correct successes for increasing m ( , )

SIMULATION - Arbitrary Errors in WPG

Page 46: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

What if grows linearly with m?

Asymptotically sharp phase transition, similar to that observed by Donoho and Tanner for homogeneous Gaussian matrices

SIMULATION - Phase Transition in Proportional Growth

Page 47: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

“L1 - [A I]”:

“L1 - comp”:

“ROMP”: Regularized orthogonal matching pursuit Needell + Vershynin ‘08

SIMULATION - Comparison to Alternative Approaches

Candes + Tao ‘05

Page 48: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

For real face images, weak proportional growth corresponds to the setting where the total image resolution grows proportionally to the size of the database.

Fraction of correct recoveries Above: corrupted images.

( 50% probability of correct recovery )

Below: reconstruction.

SIMULATION - Error Correction with Real Faces

Page 49: Sparse Representation and Compressed Sensing: Theory and Algorithms Yi Ma 1,2 Allen Yang 3 John Wright 1 CVPR Tutorial, June 20, 2009 1 Microsoft Research

So far:Face recognition as a motivating example

Sparse recovery guarantees for generic systems

New theory and new phenomena from face data

After the break:

Algorithms for sparse recovery

Many more applications in vision and sensor networks

Matrix extensions: missing data imputation and robust PCA

SUMMARY – Sparse Representation in Theory and Practice