unbiased risk estimation for sparse analysis …...unbiased risk estimation for sparse analysis...

1
Unbiased Risk Estimation for Sparse Analysis Regularization Charles Deledalle 1 , Samuel Vaiter 1 , Gabriel Peyr´ e 1 , Jalal Fadili 3 and Charles Dossal 2 1 CEREMADE, Universit´ e Paris–Dauphine — 2 GREY’C, ENSICAEN — 3 IMB, Universit´ e Bordeaux I Problem statement Consider the convex but non-smooth Analysis Sparsity Regularization problem x ? (y,λ) argmin xR N 1 2 ||y - Φx|| 2 + λ||D * x|| 1 (P λ (y )) which aims at inverting y x 0 + w by promoting sparsity and with I x 0 R N the unknown image of interest, I y R Q the low-dimensional noisy observation of x 0 , I Φ R Q×N a linear operator that models the acquisition process, I w ∼N (02 Id Q ) the noise component, I D R N ×P an analysis dictionary, and I λ> 0 a regularization parameter. How to choose the value of the parameter λ? Risk-based selection of λ I Risk associated to λ: measure of the expected quality of x ? (y,λ) wrt x 0 , R(λ)= E w ||x ? (y,λ) - x 0 || 2 . I The optimal (theoretical) λ minimizes the risk. The risk is unknown since it depends on x 0 . Can we estimate the risk solely from x ? (y,λ)? Risk estimation I Assume y 7Φx ? (y,λ) is weakly differentiable (a fortiori uniquely defined). Prediction risk estimation via SURE I The Stein Unbiased Risk Estimator (SURE): SURE(y,λ)=||y - Φx ? (y,λ)|| 2 - σ 2 Q +2σ 2 tr Φx ? (y,λ) ∂y | {z } Estimator of the DOF is an unbiased estimator of the prediction risk [Stein, 1981]: E w (SURE(y,λ)) = E w (||Φx 0 - Φx ? (y,λ)|| 2 ) . Projection risk estimation via GSURE I Let Π=Φ * (ΦΦ * ) + Φ be the orthogonal projector on ker(Φ) = Im(Φ * ), I Denote x ML (y )=Φ * (ΦΦ * ) + y , I The Generalized Stein Unbiased Risk Estimator (GSURE): GSURE(y,λ)=||x ML (y ) - Πx ? (y,λ)|| 2 - σ 2 tr((ΦΦ * ) + )+2σ 2 tr (ΦΦ * ) + Φx ? (y,λ) ∂y is an unbiased estimator of the projection risk [Vaiter et al., 2012] E w (GSURE(y,λ)) = E w (||Πx 0 - Πx ? (y,λ)|| 2 ) (see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results). Illustration of risk estimation Unknown Known SURE GSURE (here, x ? denotes x ? (y,λ) for an arbitrary value of λ) How to estimate the quantity tr (ΦΦ * ) + ∂x ? (y,λ) ∂y ? Main notations and assumptions I Let I = supp(D * x ? (y,λ)) be the support of D * x ? (y,λ), I Let J = I c be the co-support of D * x ? (y,λ), I Let D I be the submatrix of D whose columns are indexed by I , I Let s I = sign(D * x ? (y,λ)) I be the subvector of D * x ? (y,λ) whose entries are indexed by I , I Let G J = Ker D * J be the “cospace” associated to x ? (y,λ) , I To study the local behaviour of x ? (y,λ), we impose Φ to be “invertible” on G J : G J Ker Φ = {0}, I It allows us to define the matrix A [J ] = U (U * Φ * ΦU ) -1 U * , where U is a matrix whose columns form a basis of G J , I In this case, we obtain an implicit equation: x ? (y,λ) solution of P λ (y ) x ? (y,λ)=ˆ x(y,λ) , A [J ] Φ * y - λA [J ] D I s I . Is this relation true in a neighbourhood of (y,λ)? Theorem (Local Parameterization) I Even if the solutions x ? (y,λ) of P λ (y ) might be not unique, Φx ? (y,λ) is uniquely defined. I If (y,λ) 6∈H, for y, ¯ λ) close to (y,λ), ˆ xy, ¯ λ) is a solution of P y, ¯ λ) where ˆ xy, ¯ λ)= A [J ] Φ * ¯ y - ¯ λA [J ] D I s I . I Hence, it allows us writing Φx ? (y,λ) ∂y A [J ] Φ * , I Moreover, the DOF can be estimated by tr Φx ? (y,λ) ∂y = dim(G J ) . Can we compute this quantity efficiently? λ x 1 x 2 λ 0 =0 λ k x λ k =0 x 0 P 0 (y ) Computation of GSURE I One has for Z ∼N (0, Id P ), tr (ΦΦ * ) + Φx ? (y,λ) ∂y = E Z (hν (Z ), Φ * (ΦΦ * ) + Z i) where, for any z R P , ν = ν (z ) solves the following linear system Φ * Φ D J D * J 0 ν ˜ ν = Φ * z 0 . I In practice, with law of large number, the empirical mean is replaced for the expectation. I The computation of ν (z ) is achieved by solving the linear system with a conjugate gradient solver. Numerical example Super-resolution using (anisotropic) Total-Variation (a) y (b) x ? (y,λ) at the optimal λ 2 4 6 8 10 12 1 1.5 2 2.5 x 10 6 Regularization parameter λ Quadratic loss Projection Risk GSURE True Risk Compressed-sensing using multi-scale wavelet thresholding (c) x ML (d) x ? (y,λ) at the optimal λ 2 4 6 8 10 12 1 1.5 2 2.5 x 10 6 Regularization parameter λ Quadratic loss Projection Risk GSURE True Risk Perspectives: How to efficiently minimizes GSURE(y,λ) wrt λ? References Eldar, Y. C. (2009). Generalized SURE for exponential families: Applications to regularization. IEEE Transactions on Signal Processing, 57(2):471–481. Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009). A SURE approach for digital signal/image deconvolution problems. IEEE Transactions on Signal Processing, 57(12):4616–4632. Stein, C. (1981). Estimation of the mean of a multivariate normal distribution. The Annals of Statistics, 9(6):1135–1151. Vaiter, S., Deledalle, C., Peyr´ e, G., Dossal, C., and Fadili, J. (2012). Local behavior of sparse analysis regularization: Applications to risk estimation. Arxiv preprint arXiv:1204.3212. Vonesch, C., Ramani, S., and Unser, M. (2008). Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint. In ICIP, pages 665–668. IEEE. http://www.ceremade.dauphine.fr/~deledall/ [email protected]

Upload: others

Post on 26-Jun-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unbiased Risk Estimation for Sparse Analysis …...Unbiased Risk Estimation for Sparse Analysis Regularization Charles Deledalle1, Samuel Vaiter1, Gabriel Peyr e1,

Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2

1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I

Problem statement

Consider the convex but non-smooth Analysis Sparsity Regularization problem

x?(y, λ) ∈ argminx∈RN

1

2||y − Φx||2 + λ||D∗x||1 (Pλ(y))

which aims at inverting

y = Φx0 + w

by promoting sparsity and with

I x0 ∈ RN the unknown image of interest,

I y ∈ RQ the low-dimensional noisy observation of x0,

I Φ ∈ RQ×N a linear operator that models the acquisition process,

I w ∼ N (0, σ2IdQ) the noise component,

I D ∈ RN×P an analysis dictionary, and

I λ > 0 a regularization parameter.

How to choose the value of the parameter λ?

Risk-based selection of λ

I Risk associated to λ: measure of the expected quality of x?(y, λ) wrt x0,

R(λ) = Ew||x?(y, λ)− x0||2 .I The optimal (theoretical) λ minimizes the risk.

The risk is unknown since it depends on x0.

Can we estimate the risk solely from x?(y,λ)?

Risk estimation

I Assume y 7→ Φx?(y, λ) is weakly differentiable (a fortiori uniquely defined).

Prediction risk estimation via SURE

I The Stein Unbiased Risk Estimator (SURE):

SURE(y, λ) =||y − Φx?(y, λ)||2 − σ2Q + 2σ2 tr

(∂Φx?(y, λ)

∂y

)︸ ︷︷ ︸Estimator of the DOF

is an unbiased estimator of the prediction risk [Stein, 1981]:

Ew(SURE(y, λ)) = Ew(||Φx0 − Φx?(y, λ)||2) .

Projection risk estimation via GSURE

I Let Π = Φ∗(ΦΦ∗)+Φ be the orthogonal projector on ker(Φ)⊥ = Im(Φ∗),

I Denote xML(y) = Φ∗(ΦΦ∗)+y,

I The Generalized Stein Unbiased Risk Estimator (GSURE):

GSURE(y, λ) =||xML(y)− Πx?(y, λ)||2 − σ2 tr((ΦΦ∗)+) + 2σ2 tr

((ΦΦ∗)+∂Φx?(y, λ)

∂y

)is an unbiased estimator of the projection risk [Vaiter et al., 2012]

Ew(GSURE(y, λ)) = Ew(||Πx0 − Πx?(y, λ)||2)

(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).

Illustration of risk estimation

Unkn

own

Known

SURE GSURE

(here, x? denotes x?(y, λ) for an arbitrary value of λ)

How to estimate the quantity tr((ΦΦ∗)+∂x

?(y,λ)∂y

)?

Main notations and assumptions

I Let I = supp(D∗x?(y, λ)) be the support of D∗x?(y, λ),

I Let J = Ic be the co-support of D∗x?(y, λ),

I Let DI be the submatrix of D whose columns are indexed by I ,

I Let sI = sign(D∗x?(y, λ))I be the subvector of D∗x?(y, λ) whose entries are indexed by I ,

I Let GJ = KerD∗J be the “cospace” associated to x?(y, λ) ,

I To study the local behaviour of x?(y, λ), we impose Φ to be “invertible” on GJ :

GJ ∩ Ker Φ = {0},I It allows us to define the matrix

A[J ] = U(U∗Φ∗ΦU)−1U∗,where U is a matrix whose columns form a basis of GJ ,

I In this case, we obtain an implicit equation:

x?(y, λ) solution of Pλ(y)⇔ x?(y, λ) = x(y, λ) , A[J ]Φ∗y − λA[J ]DIsI .

Is this relation true in a neighbourhood of (y,λ)?

Theorem (Local Parameterization)

I Even if the solutions x?(y, λ) of Pλ(y) might benot unique, Φx?(y, λ) is uniquely defined.

I If (y, λ) 6∈ H, for (y, λ) close to (y, λ), x(y, λ)is a solution of P(y, λ) where

x(y, λ) = A[J ]Φ∗y − λA[J ]DIsI .

I Hence, it allows us writing

∂Φx?(y, λ)

∂y= ΦA[J ]Φ∗ ,

I Moreover, the DOF can be estimated by

tr

(∂Φx?(y, λ)

∂y

)= dim(GJ) .

Can we compute this quantity efficiently?

x1

x2

�0 = 0 �k

x�k= 0

x�0

P0(y)

Monday, September 24, 12

Computation of GSURE

I One has for Z ∼ N (0, IdP ),

tr

((ΦΦ∗)+∂Φx?(y, λ)

∂y

)= EZ(〈ν(Z), Φ∗(ΦΦ∗)+Z〉)

where, for any z ∈ RP , ν = ν(z) solves the following linear system(Φ∗Φ DJD∗J 0

)(νν

)=

(Φ∗z

0

).

I In practice, with law of large number, the empirical mean is replaced for the expectation.

I The computation of ν(z) is achieved by solving the linear system with a conjugate gradient solver.

Numerical example

Super-resolution using (anisotropic) Total-Variation

(a) y

(b) x?(y, λ) at the optimal λ 2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter λ

Qu

ad

ratic lo

ss

Projection Risk

GSURE

True Risk

Compressed-sensing using multi-scale wavelet thresholding

(c) xML

(d) x?(y, λ) at the optimal λ2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter λ

Qu

ad

ratic lo

ss

Projection Risk

GSURE

True Risk

Perspectives: How to efficiently minimizes GSURE(y,λ) wrt λ?

References

Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.

Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.

Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.

Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.

Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.

http://www.ceremade.dauphine.fr/~deledall/ [email protected]