sparse representation and compressed sensing: theory and algorithms yi ma 1,2 allen yang 3 john...
Post on 15-Jan-2016
212 views
TRANSCRIPT
Sparse Representation and Compressed Sensing:Theory and Algorithms
Yi Ma1,2 Allen Yang3 John Wright1
CVPR Tutorial, June 20, 2009
1Microsoft Research Asia
3University of California Berkeley
2University of Illinois at Urbana-Champaign
MOTIVATION – Applications to a variety of vision problems
• Face Recognition:
Wright et al PAMI ’09, Huang CVPR ’08, Wagner CVPR ’09 …
• Image Enhancement and Superresolution:
Elad TIP ’06, Huang CVPR ‘08, …
• Image Classification:
Mairal CVPR ‘08, Rodriguez ‘07, many others …
• Multiple Motion Segmentation:
Rao CVPR ‘08, Elfamhir CVPR ’09 …
• … and many others, including this conference
MOTIVATION – Applications to a variety of vision problems
• Face Recognition:
Wright et al PAMI ’09, Huang CVPR ’09, Wagner CVPR ’09 …
• Image Enhancement and Superresolution:
Elad TIP ’06, Huang CVPR ‘08, …
• Image Classification:
Mairal CVPR ‘08, Rodriguez ‘07, …
• Multiple Motion Segmentation:
Rao CVPR ‘08, Elfamhir CVPR ’09 …
• … and many others, including this conference
When and why can we expect such good performance?
A closer look at the theory …
SPARSE REPRESENTATION – Model problem
y = Ax
=
y 2 RmA 2 Rm£ n ; m ¿ nObservation Unknown
?
?
?
?
…
x 2 Rn
Underdetermined system of linear equations,
Two interpretations:
• Compressed sensing: A as sensing matrix
• Sparse representation: A as overcomplete dictionary
SPARSE REPRESENTATION – Model problem
y = Ax
=
A 2 Rm£ n ; m ¿ nObservation Unknown
?
?
?
?
…
x 2 Rn
Underdetermined system of linear equations,
Many more unknowns than observations → no unique solution.
• Classical answer: minimum -norm solution
• Emerging applications: instead desire sparse solutions
2̀
y 2 Rm
SPARSE SOLUTIONS – Uniqueness
minkxk0 subj y = Ax:
Look for the sparsest solution:
k¢k0 - number of nonzero elements
SPARSE SOLUTIONS – Uniqueness
minkxk0 subj y = Ax:
Is the sparsest solution unique?
Look for the sparsest solution:
k¢k0 - number of nonzero elements
spark(A) - size of smallest set of linearly dependent columns of A.
= =x1 x2
A1 A2
y ) A1x1 ¡ A2x2 =0:
SPARSE SOLUTIONS – Uniqueness
minkxk0 subj y = Ax:
Is the sparsest solution unique?
Look for the sparsest solution:
k¢k0 - number of nonzero elements
spark(A) - size of smallest set of linearly dependent columns of A.
Proposition [Gorodnitsky & Rao ‘97]:
If with ,
then is the unique solution to
y = Ax0 kx0k0 <spark(A )
2
x0 minkxk0 subj y = Ax:
SPARSE SOLUTIONS – So How Do We Compute It?
(P0) minkxk0 subj y = Ax:
Looking for the sparsest solution:
Bad News: NP-hard in the worst case, hard to approximate within certain constants [Amaldi & Kann ’95].
(P0)
SPARSE SOLUTIONS – So How Do We Compute It?
(P0) minkxk0 subj y = Ax:
Looking for the sparsest solution:
Maybe we can still solve important cases?
• Greedy algorithms:
Matching Pursuit, Orthogonal Matching Pursuit [Mallat & Zhang ‘93]CoSAMP [Needell & Tropp ‘08]
• Convex programming [Chen, Donoho & Saunders ‘94]
Bad News: NP-hard in the worst case, hard to approximate within certain constants [Amaldi & Kann ’95].
(P0)
SPARSE SOLUTIONS – The Heuristic
(P0) minkxk0 subj y = Ax:
Looking for the sparsest solution:
(P1) minkxk1 subj y = Ax:
convex relaxation
Linear program,solvable in polynomial time.
Intractable.
Why ? Convex envelope of over the unit cube:1̀ 0̀
1̀
0̀
Rich applied history – geosciences, sparse coding in vision, statistics
EQUIVALENCE – A stronger motivation
Theorem [Candes & Tao ’04, Donoho ‘04]:
For Gaussian , with overwhelming probability, wheneverA
x0 = argminkxk1 subj Ax = Ax0:
kx0k0 <½?m
“ -minimization recovers any sufficiently sparse solution”
1̀
In many cases, the solutions to (P0) and (P1) are exactly the same:
Mutual coherence: largest inner product between distinct columns of A
GUARANTEES – “Well-Spread” A
Low mutual coherence: vectors are well-spread in the space
Mutual coherence:
Theorem [Elad & Donoho ’03, Gribvonel & Nielsen ‘03]:
x0minimization uniquely recovers any with .
1̀
Strong point: checkable condition.
Weakness:
low coherence can only guarantee recovery up to nonzeros.
GUARANTEES – “Well-Spread” A
Restricted Isometry Constants:
-sparse , s.t. for all
GUARANTEES – Beyond Coherence
Low coherence: “any submatrix consisting of two columns of A is well-conditioned”
Stronger bounds by looking at larger submatrices?
“Column submatrices of A are uniformly well-conditioned”Low RIC:
Restricted Isometry Constants:
-sparse , s.t. for all
Theorem [Candes & Tao ’04, Candes ‘07]:
x0If , then -minimization recovers any k-sparse . 1̀
For random A, this guarantees recovery up to linear sparsity:
GUARANTEES – Beyond Coherence
kx0k0 <½?m
Necessary and sufficient condition:
GUARANTEES – Sharp Conditions?
solves
iff
polytope spanned by columns of A and their negatives
Necessary and sufficient condition:
uniquely recovers with support and signs iff is a simplicial face of .
Uniform guarantees for -sparse P centrally -neighborly.
[Donoho + Tanner ’08]
[Donoho ’06]
GUARANTEES – Geometric Interpretation
Geometric understanding gives sharp thresholds for sparse recovery with Gaussian A [Donoho & Tanner ‘08]:
Aspect ratio of A
Sparsity
Weak threshold
Strong threshold
GUARANTEES – Geometric Interpretation
Failure almost always
Success almost always
Success always
Explicit formulas in the wide-matrix limit [Donoho & Tanner ‘08]:
GUARANTEES – Geometric Interpretation
Weak threshold:
Strong threshold:
What if there is noise in the observation?
GUARANTEES – Noisy Measurements
y = Ax +z:
Natural approach: relax the constraint:
minkxk1 subj ky ¡ Axk22 · "2
Studied in several literatures
Statistics – LASSO Signal processing – BPDN.
Gaussian or bounded 2-norm
GUARANTEES – Noisy Measurements
Natural approach: minkxk1 subj ky ¡ Axk22 · "2
Theorem [Donoho, Elad & Temlyakov ‘06]: Recovery is stable: kx̂ ¡ x0k2 ·
4kzk221¡ ¹ (A )(4kx0k0¡ 1)
See also [Candes-Romberg-Tao ‘06], [Wainwright ‘06], [Meinshausen & Yu ’06], [Zhao & Yu ‘06], …
What if there is noise in the observation? y = Ax +z:
GUARANTEES – Noisy Measurements
Theorem [Candes-Romberg-Tao ‘06]: Recovery is stable – for A satisfying an appropriate condition,
kx̂ ¡ x0k2 · C1kzk2+C2kx0¡ x0;S k1p
S
RIP4S
See also [Donoho ‘06], [Wainwright ‘06], [Meinshausen & Yu ’06], [Zhao & Yu ‘06], …
Natural approach: minkxk1 subj ky ¡ Axk22 · "2
What if there is noise in the observation? y = Ax +z:
x0;S – best S-term approximation
Similar sparse recovery problems explored in data streaming community:
Combinatorial algorithms → fast encoding/decoding at expense of suboptimal # of measurements
Based on ideas from group testing, expander graphs
CONNECTIONS – Sketching and Expanders
y 2 Rm
A 2 Rm£ n ; m ¿ n
Data stream
020
50001
…
x 2 RnSketch =
[Gilbert et al ‘06], [Indyk ‘08], [Xu & Hassibi ‘08]
Sparse recovery guarantees can also be derived via probabilistic constructions from high-dimensional geometry:
• The Johnson-Lindenstrauss lemma
• Dvoretsky’s almost-spherical section theorem:
There exist subspaces of dimension as high ason which the and norms are comparable:
CONNECTIONS – High dimensional geometry
¡ ½Rm c¢m1̀ 2̀
8x 2 ¡ ; Cpmkxk2 · kxk1 ·
pmkxk2
Given n points a random projection into dimensions preserves pairwise distances:
x1 : : :xn ½RmC log(m)"2
(1¡ ")kxi ¡ xj k2 · kP xi ¡ P xj k · kxi ¡ xj k2:
Sparse solutions can often be recovered by linear programming.
Performance guarantees for arbitrary matrices with “uniformly well-spread columns”:
• (in)-coherence• Restricted Isometry
Sharp conditions via polytope geometry
Very well-understood performance for random matrices
What about matrices arising in vision… ?
THE STORY SO FAR – Sparse recovery guarantees
If test image is also of subject , then
Linear subspace model for images of same face under varying illumination:
Subject i Training
for some . .
Can represent any test image wrt the entire training set as
coefficients corruption, occlusion
Combined training dictionary
Test image
PRIOR WORK - Face Recognition as Sparse Representation
Underdetermined system of linear equations in unknowns :
Wright, Yang, Ganesh, Sastry, and Ma. Robust Face Recognition via Sparse Representation, PAMI 2008
Seek the sparsest solution:
Solution is not unique … but
should be sparse: ideally, only supported on images of the same subject expected to be sparse: occlusion only affects a subset of the pixels
convex relaxation
PRIOR WORK - Face Recognition as Sparse Representation
GUARANTEES – What About Vision Problems?
99.3%90.7%
37.5%
Behavior under varying levels of random pixel corruption:
Can existing theory explain this phenomenon?
Recognition rate
• Apply parity check matrix s.t. , yielding
• Set • Recover from clean system
PRIOR WORK - Error Correction by minimization
Underdetermined system in sparse e only
Candes and Tao [IT ‘05]:
Succeeds whenever in the reduced system .
• Apply parity check matrix s.t. , yielding
• Set • Recover from clean system
Succeeds whenever in the reduced system .
PRIOR WORK - Error Correction by minimization
Underdetermined system in sparse e only
This work:• Instead solve
Candes and Tao [IT ‘05]:
Can be applied when A is wide (no parity check).
• Apply parity check matrix s.t. , yielding
• Set • Recover from clean system
PRIOR WORK - Error Correction by minimization
Underdetermined system in sparse e only
Succeeds whenever in the expanded system .
This work:• Instead solve
Candes and Tao [IT ‘05]:
Succeeds whenever in the reduced system .
GUARANTEES – What About Vision Problems?
Results so far: should not succeed.
very sparse: # images per subject,
often nonnegative (illumination cone models).
as dense as possible: robust to highest possible corruption.
Highly coherent
( volume )
As dimension , an even more striking phenomenon emerges:
SIMULATION - Dense Error Correction?
SIMULATION - Dense Error Correction?
As dimension , an even more striking phenomenon emerges:
SIMULATION - Dense Error Correction?
As dimension , an even more striking phenomenon emerges:
SIMULATION - Dense Error Correction?
As dimension , an even more striking phenomenon emerges:
SIMULATION - Dense Error Correction?
As dimension , an even more striking phenomenon emerges:
Conjecture: If the matrices are sufficiently coherent, then for any error fraction , as , solving
corrects almost any error with .
SIMULATION - Dense Error Correction?
As dimension , an even more striking phenomenon emerges:
DATA MODEL - Cross-and-Bouquet
Our model for should capture the fact that the columns are tightly clustered around a common mean :
We call this the “Cross-and-Bouquet’’ (CAB) model.
Mean is mostly incoherent with standard (error) basis
L^-norm of deviations well-controlled ( -> v )
ASYMPTOTIC SETTING - Weak Proportional Growth
• Observation dimension
• Problem size grows proportionally:
• Error support grows proportionally:
• Support size sublinear in :
Sublinear growth of is necessary to correct arbitrary fractions of errors:
Need at least “clean” equations.
ASYMPTOTIC SETTING - Weak Proportional Growth
Empirical Observation:
If grows linearly in , sharp phase transition at .
• Observation dimension
• Problem size grows proportionally:
• Error support grows proportionally:
• Support size sublinear in :
Whether is recovered depends only on
Call -recoverable if with these signs and support
and the minimizer is unique.
NOTATION - Correct Recovery of Solutions
“ recovers any sparse signal from almost any error with density less than 1”
Recall notation:
MAIN RESULT - Correction of Arbitrary Error Fractions
Fraction of correct successes for increasing m ( , )
SIMULATION - Arbitrary Errors in WPG
What if grows linearly with m?
Asymptotically sharp phase transition, similar to that observed by Donoho and Tanner for homogeneous Gaussian matrices
SIMULATION - Phase Transition in Proportional Growth
“L1 - [A I]”:
“L1 - comp”:
“ROMP”: Regularized orthogonal matching pursuit Needell + Vershynin ‘08
SIMULATION - Comparison to Alternative Approaches
Candes + Tao ‘05
For real face images, weak proportional growth corresponds to the setting where the total image resolution grows proportionally to the size of the database.
Fraction of correct recoveries Above: corrupted images.
( 50% probability of correct recovery )
Below: reconstruction.
SIMULATION - Error Correction with Real Faces
So far:Face recognition as a motivating example
Sparse recovery guarantees for generic systems
New theory and new phenomena from face data
After the break:
Algorithms for sparse recovery
Many more applications in vision and sensor networks
Matrix extensions: missing data imputation and robust PCA
SUMMARY – Sparse Representation in Theory and Practice