[ieee 2014 22nd signal processing and communications applications conference (siu) - trabzon, turkey...

4
SEYREK TEMSİLİYET VE YÖNLÜLÜK YAPISI TAŞIYAN SÖZLÜKLERLE TEK GÖRÜNTÜLERDE ÇÖZÜNÜRLÜK ARTIRIMI SINGLE IMAGE SUPER RESOLUTION BASED ON SPARSE REPRESENTATION VIA DIRECTIONALLY STRUCTURED DICTIONARIES Fahime Farhadifard, Elham Abar, Mahmoud Nazzal, Hüseyin Ozkaramanlı Electrical and Electronic Engineering Dep. EMU University ÖZETÇE Bu makalede yönlülük yapısı taşıyan öğrenilen sözlüklerle görüntülerin çözünürlüğünün artırılması için bir yöntem önerilmektedir. Yüksek ve alçak çözünürlükteki eğitme verileri şablonlar kullanılarak değişik yönlere kümelendirilmiştir. Her bir yönlü küme için bir alçak bir de yüksek çözünürlükte sözlük öğrenilmiştir. Yüksek çözünürlükteki sinyalin geriçatımında ise yamalar yönlerine göre sınıflandırılarak en iyi sözlük seçilmiştir. Önce alçak cözünürlükteki yamanın seyrek kodları hesaplanmış; daha sonra bu kodların yüksek çözünürlükteki kodlarla eşit olduklarından kullanılarak yüksek çözünürlükteki yama hesaplanmıştır. Doğru model (sözlük) seçildiği durumda önerilen yöntem yönlülük yapısı kullanmayan yöntemlere göre Kodak veritabanı ve bazı standard görüntülerde 0.2 dB bir iyileştirme yaratmaktadır. Bu iyileştirme görsel olarak da gözlemlenebilmektedir. Anahtar Kelimeler: süper çözünürlük, yapısal yönlü sözlük. ABSTRACT This paper introduces a single-image super-resolution algorithm based on selective sparse coding over several directionally structured learned dictionaries. The sparse coding of high- resolution (HR) image patch over a HR dictionary is assumed to be identical to that of the corresponding low-resolution (LR) patches as coded over a coupled LR dictionary. However, the training patches are clustered by measuring the similarity between a patch and a number of directional templates sets. Each template set characterizes directional variations possessing a specific directional structure. For each cluster, a pair of directionally structured dictionaries is learned; one dictionary for each resolution level. An analogous clustering is performed in the reconstruction phase; each LR image patch is decided to belong to a specific cluster based on its directional structure. This decision allows for selective sparse coding of image patches, with improved representation quality and reduced computational complexity[1]. With appropriate sparse model selection, the proposed algorithm is shown to out-perform a leading super-resolution algorithm which uses a pair of universal dictionaries. Simulations validate this result both visually and quantitatively, with an average of 0.2 dB improvement in PSNR over Kodak set and some benchmark images. Keywords: super resolution, structurally directional dictionary. 1. INTRODUCTION Due to the high demand on HR images in many applications, the problem of super-resolution is getting rise as an active area of research. Although a HR image can be obtained using high-end cameras, not only it is very costly and impractical to install for some applications, but also, the resolution of obtaining images is not sufficient in some cases such as medical or satellite imaging and computer vision. The most recent and successful approach which overcomes the above difficulties, is to use sparse representation to enhance the quality of an image. According to the Sparseland model [2], a signal can be effectively approximated as a linear combination of a few prototype signals. Such signals form the columns of a dictionary which is typically obtained by learning over natural image patches. In this approach, sparsity is effectively used as a regularizer to the ill-posed super-resolution problem. The key idea for such regularization is the assumption that linear relationships of HR images are preserved in their LR projections [3]. An algorithm of single image super-resolution via sparse representation is proposed by Yang et al.[4,5] and Zeyde et al. [6]. The algorithm of [4] uses a set of HR images, then by downsampling and blurring operators, the corresponding LR images are obtained. The mean value of each patch is subtracted and then patches are used to learn for dictionary pairs. A pair of HR and LR dictionaries is trained. The main assumption behind these algorithms is to use the same sparse representation for both HR and LR patches. At the end, each HR patch is reconstructed using the HR dictionary with the sparse representation of its corresponding LR patch. In [6], the main focus is super-resolving the high frequency (HF) components of an image which is most difficult task where bicubic interpolation is not successful. With this algorithm, both the HR and LR dictionaries are dedicated for the representation of the HF components of image patches. The K-SVD algorithm [7] is used herein for dictionary learning. Then, in the reconstruction part, by employing the OMP algorithm [8], the sparse representation of the LR patches is calculated and is then used together with the HR dictionary to recover the HR patches. Then the image of combining those patches is added to the mid-resolution (bicubic interpolation version of the LR image) and produces the HR output image. In this paper, we use J. Yang’s idea of assuming that both the HR and LR patches approximately have the same coefficients, and extend the concept by designing directionally structured dictionaries and applying them to the process of SR. A structural classification of image patches based on directional structure is defined to provide information about the edges where are the most difficult parts of any image to super-resolve. Then, eight pairs of HR and LR dictionary pair are designed, each for every structural cluster. Besides, we also consider a pair of dictionaries for non-directional patches. The LR input image patches are super-resolved using the most appropriate HR dictionary chosen based on the representation error corresponding to all LR dictionaries. The remainder of this paper is organized as follows. The super- resolution via sparse representation approach reviewed in Section 2. 1718 2014 IEEE 22nd Signal Processing and Communications Applications Conference (SIU 2014)

Upload: huseyin

Post on 21-Feb-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

SEYREK TEMSİLİYET VE YÖNLÜLÜK YAPISI TAŞIYAN SÖZLÜKLERLE TEK GÖRÜNTÜLERDE ÇÖZÜNÜRLÜK ARTIRIMI

SINGLE IMAGE SUPER RESOLUTION BASED ON SPARSE REPRESENTATION VIA DIRECTIONALLY STRUCTURED DICTIONARIES

Fahime Farhadifard, Elham Abar, Mahmoud Nazzal, Hüseyin Ozkaramanlı

Electrical and Electronic Engineering Dep.

EMU University

ÖZETÇE

Bu makalede yönlülük yapısı taşıyan öğrenilen sözlüklerle görüntülerin çözünürlüğünün artırılması için bir yöntem önerilmektedir. Yüksek ve alçak çözünürlükteki eğitme verileri şablonlar kullanılarak değişik yönlere kümelendirilmiştir. Her bir yönlü küme için bir alçak bir de yüksek çözünürlükte sözlük öğrenilmiştir. Yüksek çözünürlükteki sinyalin geriçatımında ise yamalar yönlerine göre sınıflandırılarak en iyi sözlük seçilmiştir. Önce alçak cözünürlükteki yamanın seyrek kodları hesaplanmış; daha sonra bu kodların yüksek çözünürlükteki kodlarla eşit olduklarından kullanılarak yüksek çözünürlükteki yama hesaplanmıştır. Doğru model (sözlük) seçildiği durumda önerilen yöntem yönlülük yapısı kullanmayan yöntemlere göre Kodak veritabanı ve bazı standard görüntülerde 0.2 dB bir iyileştirme yaratmaktadır. Bu iyileştirme görsel olarak da gözlemlenebilmektedir.

Anahtar Kelimeler: süper çözünürlük, yapısal yönlü sözlük.

ABSTRACT

This paper introduces a single-image super-resolution algorithm based on selective sparse coding over several directionally structured learned dictionaries. The sparse coding of high-resolution (HR) image patch over a HR dictionary is assumed to be identical to that of the corresponding low-resolution (LR) patches as coded over a coupled LR dictionary. However, the training patches are clustered by measuring the similarity between a patch and a number of directional templates sets. Each template set characterizes directional variations possessing a specific directional structure. For each cluster, a pair of directionally structured dictionaries is learned; one dictionary for each resolution level. An analogous clustering is performed in the reconstruction phase; each LR image patch is decided to belong to a specific cluster based on its directional structure. This decision allows for selective sparse coding of image patches, with improved representation quality and reduced computational complexity[1]. With appropriate sparse model selection, the proposed algorithm is shown to out-perform a leading super-resolution algorithm which uses a pair of universal dictionaries. Simulations validate this result both visually and quantitatively, with an average of 0.2 dB improvement in PSNR over Kodak set and some benchmark images.

Keywords: super resolution, structurally directional dictionary.

1. INTRODUCTION

Due to the high demand on HR images in many applications, the

problem of super-resolution is getting rise as an active area of

research. Although a HR image can be obtained using high-end

cameras, not only it is very costly and impractical to install for some

applications, but also, the resolution of obtaining images is not

sufficient in some cases such as medical or satellite imaging and

computer vision.

The most recent and successful approach which overcomes the

above difficulties, is to use sparse representation to enhance the

quality of an image. According to the Sparseland model [2], a signal

can be effectively approximated as a linear combination of a few

prototype signals. Such signals form the columns of a dictionary

which is typically obtained by learning over natural image patches.

In this approach, sparsity is effectively used as a regularizer to the

ill-posed super-resolution problem. The key idea for such

regularization is the assumption that linear relationships of HR

images are preserved in their LR projections [3].

An algorithm of single image super-resolution via sparse

representation is proposed by Yang et al.[4,5] and Zeyde et al. [6].

The algorithm of [4] uses a set of HR images, then by

downsampling and blurring operators, the corresponding LR images

are obtained. The mean value of each patch is subtracted and then

patches are used to learn for dictionary pairs. A pair of HR and LR

dictionaries is trained. The main assumption behind these

algorithms is to use the same sparse representation for both HR and

LR patches. At the end, each HR patch is reconstructed using the

HR dictionary with the sparse representation of its corresponding

LR patch.

In [6], the main focus is super-resolving the high frequency (HF)

components of an image which is most difficult task where bicubic

interpolation is not successful. With this algorithm, both the HR and

LR dictionaries are dedicated for the representation of the HF

components of image patches. The K-SVD algorithm [7] is used

herein for dictionary learning. Then, in the reconstruction part, by

employing the OMP algorithm [8], the sparse representation of the

LR patches is calculated and is then used together with the HR

dictionary to recover the HR patches. Then the image of combining

those patches is added to the mid-resolution (bicubic interpolation

version of the LR image) and produces the HR output image.

In this paper, we use J. Yang’s idea of assuming that both the HR

and LR patches approximately have the same coefficients, and

extend the concept by designing directionally structured dictionaries

and applying them to the process of SR. A structural classification

of image patches based on directional structure is defined to provide

information about the edges where are the most difficult parts of any

image to super-resolve. Then, eight pairs of HR and LR dictionary

pair are designed, each for every structural cluster. Besides, we also

consider a pair of dictionaries for non-directional patches. The LR

input image patches are super-resolved using the most appropriate

HR dictionary chosen based on the representation error

corresponding to all LR dictionaries.

The remainder of this paper is organized as follows. The super-

resolution via sparse representation approach reviewed in Section 2.

1718

2014 IEEE 22nd Signal Processing and Communications Applications Conference (SIU 2014)

Section 3 presents the proposed super-resolution algorithm.

Quantitative and qualitative Simulations are provided in Section 4,

with the conclusion in Section 5.

2. SUPER RESOLUTION VIA SPARSE REPRESENTATION

Obtaining a HR image from a single LR image is known as

“single image super-resolution (SISR)”. The LR image is a version of the HR image which has lost most of its higher frequency

information during acquisition, transmission or storage. In order to

solve such a problem which has so many solutions, two constraints

are assumed: (i) a reconstruction constraint: based on image

observation model, the reconstructed image should be in

agreement with the image ; and (ii) a sparsity prior: image can

be sparsely represented over an over-complete dictionary and it can

be recovered from the LR version.

To be more precise, considering the LR image which is a

downsampled and blurred version of the HR image , and

assuming that there is a HR over-complete dictionary of bases which is a large matrix learned using HR image patches, the

vectorized patches of image , denoted by can be sparsely

represented over the dictionary . Therefore, each patch can be

represented as where is a vector with few

nonzero elements ( ). The relationship between a HR patch and

its LR counterpart can be expressed as: (1)

where is a downsampling operator, is a blurring filter and

denotes their combined effect. Substituting the representation of

into (1) and noting that , one obtains: (2)

An implication of (2) is that the will also have the same sparse

representation coefficients . Now, given the LR patches, one can

obtain the representation coefficients using a vector selection

method such as OMP. After obtaining the sparse representation

coefficients, one can reconstruct the HR patch as: (3)

3. THE PROPOSED SUPER RESOLUTION APPROACH

The proposed algorithm consists of two main phases, training and

reconstruction as summarized in Figures 1 and 2. During the

training phase, several sets of directionally structured HR and LR

dictionary pairs are designed. These dictionaries are then used to

reconstruct HR images during the reconstruction phase.

3.1. Training Phase

This phase starts by collecting a set of HR images. Then LR

versions of these images are constructed using downsampling and

blurring operators. To reach the HR image dimension, LR images

are scaled up to the size of HR images via bicubic interpolation and

are termed mid-resolution images. This scaling serves mainly for

rendering the coding part easily. The main concern in this phase is

to learn the most appropriate dictionaries to accurately represent

edges and texture contents of an image. Edge and texture regions

constitute the HF components of an image.

To follow this idea, in order to learn the HR dictionaries from the

HF components particularly, HR images are subtracted from the

mid-resolution one. Then, local patches are extracted and vectorized

to form the HR training set . In order to ensure that the

reconstructed HR patch is the best estimate, the calculated

coefficients must fit to the most relevant part of LR images. Thus,

two-dimensional (2D) Laplacian and Gradient high-pass filters are

employed to remove low-frequency contents of LR images as done

in the approaches of [4-6]. This choice is reasonable as the human

visual system is sensitive to the HF components of an image. The

same way as done with HR training patches, the local mid-

resolution patches (features) are extracted and vectorized to form a

LR training set .

Instead of using a single pair of HR and LR dictionaries, we

propose the use of several directionally structured dictionary pairs.

In this work, directional patterns of the 2-D space are effectively

accounted for by sampling this space into eight directional

orientations. Each orientation is represented by a set of directional

variations that are aligned in a certain direction. Therefore the

angular orientation between each two successive directions

is . The role of these directional patterns is to effectively

cluster all the training patches based on their directional structure.

Along with these directional clusters, a non-directional cluster is

also used to contain patches that are not aligned along any of the

aforementioned directional orientations. These patches are the ones

whose correlation with the defined directional templates is less than

a specific threshold, which is empirically set based on a correlation

histogram.

After clustering, and before dictionary learning, Principal

Component Analysis (PCA) is applied to the clustered patches as a

dimensionality reduction tool. This tool has been shown able to

significantly reduce the energy of training patches, and therefore

reduce the dictionary learning computational complexity. To this

end, patches of each cluster are fed to the K-SVD algorithm to train

for a LR dictionary. Moreover, sparse representation coefficients of

these LR patches, along with the corresponding HR patches are used

to calculate the corresponding HR dictionary. As outlined in [6], the

HR dictionary calculation is formulated as in (4). (4)

where is the matrix coefficients contains all coefficients vector

obtained from the LR dictionaries for each cluster .

The main steps of the proposed directionally structured dictionary

learning phase are outlined in Algorithm 1.

Algorithm 1 The Training phase

Input: A set of HR training images. Obtaining the LR images. Feature and patch extraction. Designing eight directional templates. Clustering patches and features into eight

directional clusters and one non-directional one. Employing the K-SVD algorithm to learn

corresponding LR dictionaries. Calculating HR dictionaries.

Output: HR and LR structurally directional dictionaries.

Fig 1: Training phase of the proposed algorithm.

3.2. Reconstruction Phase

The reconstruction phase adopts the same rationale as done in the

training phase. Given a LR image to be reconstructed, it first needs

to be rescaled to the same size as the HR image. This is done by

bicubic interpolation. Using this so called mid-resolution image, it

is first filtered to extract the meaningful features in exactly the

1719

2014 IEEE 22nd Signal Processing and Communications Applications Conference (SIU 2014)

same way as done in the learning phase. Then, features (being

patches of an image interpolated to the suitable mid-resolution

level) are extracted to be recovered using the most suitable

dictionary.

Each and every LR feature is sparsely coded over all LR

dictionaries and the representation error between the reconstructed

feature and the original one is calculated. This error is adopted as

clustering criterion in this work; the LR dictionary that corresponds

to the smallest representation error decides on which cluster that

patch belong to, and thus which dictionary pair will be used for that

particular patch for the sparse-based reconstruction. The rationale

of the testing phase is outlined in Algorithm 2.

Algorithm 2 The Testing phase

Input: Given a LR image and nine pairs of dictionaries. Feature extraction. Dictionary selection model. Sparse coding patches using LR dictionary. Reconstruction HR patches.

Output: Combine HR patches and add the output image to

the mid-resolution one to construct the HR image.

Fig 2: Reconstruction phase of the proposed algorithm.

4. SIMULATION AND RESULTS

This section presents experiments inspecting the performance of

the proposed algorithm, as compared to that of the algorithm of

baseline of Zeyde et al. [6] and bicubic interpolation. The algorithm

of [6] is referred to as the baseline algorithm throughout this paper.

Tests are conducted on images of the Kodak set along with some

other benchmark images. Besides, results also include the

performance of the proposed algorithm when it is armed with

correct model selection.

4.1. Quantitative Experimentation (PSNR)

Herein, test images are downsampled and blurred a quarter of their

dimension to obtain their LR versions. LR versions are then scaled-

up to the dimension of their original sizes via the aforementioned

three approaches. The Peak Signal-to-Noise Ratio (PSNR)

measure, described in (5), is used for quantitative performance

evaluation. (5)

where z is the true image of size and its estimate and denotes the mean-square-error (MSE) between z and

which is defined as:

∑∑

(6)

Table 1 lists PSNR values of the conducted simulations. In this

simulation, the proposed algorithm uses with sparsity 3 and 3×3

patch sizes for the LR images (6×6 for HR images) such that the

trained HR dictionaries have a dimension of 36×130. 20 K-SVD

iterations are used to train for the LR dictionaries. The baseline

method is used with the same configuration and the dictionary

dimension is 9× (36×130).

According to the Table 1, the proposed algorithm has almost the

same performance as the baseline algorithm of [6]. It is worth

mentioning that choosing the best HR dictionary based on features

is not a trivial task and the correct HR dictionary is not always

chosen. Considering designed directionally structured dictionaries

with a model selection which always chooses the correct HR

dictionary, the proposed algorithm has an average PSNR

improvement of 0.2 dB over the baseline algorithm and 1.5dB

improvement over Bicubic interpolation.

From the results in this Table, it can be observed that for the

images with directional nature, PSNR improvements are noticeable.

As an example, considering the proposed algorithm with correct

model selection, the PSNR improvement of the Barbara image over

the baseline algorithm is 0.5 dB, and 1.18 dB over Bicubic

interpolation. Besides, that PSNR improvement for the Zone-plate

image which is a curvy directional image, is around 1 dB over the

baseline algorithm and 1.54 dB over bicubic interpolation.

Table 1: PSNR (dB) comparison, corresponding to Bicubic

interpolation, the baseline algorithm, the proposed algorithm

and the proposed algorithm with correct model selection.

Image Bicubic

Interpolation

Baseline

Algorithm

Proposed

Algorithm

Proposed Algorithm

with correct model

selection

K.1 26.7 27.85 27.93 28.15

K.2 34.0 35.04 35.02 35.14

K.3 35.0 36.66 36.58 36.66

K.4 34.6 36.03 35.99 36.03

K.5 27.1 28.95 28.95 29.21

K.6 28.3 29.42 29.35 29.66

K.7 34.3 36.33 36.22 36.36

K.8 24.3 25.5 25.46 25.94

K.9 33.1 35.04 35.03 35.16

K.10 32.9 34.75 34.75 34.79

K.11 29.9 31.14 31.13 31.39

K.12 33.6 35.58 35.38 35.58

K.13 24.7 25.54 25.5 25.8

K.14 29.9 31.3 31.25 31.47

K.15 32.9 34.9 34.72 34.9

K.16 32.1 32.84 32.82 33.1

K.17 32.9 34.38 34.38 34.47

K.18 28.8 29.89 29.85 30.07

K.19 28.8 30.04 30.01 30.43

K.20 32.4 34.11 34.04 34.18

K.21 29.3 30.36 30.32 30.65

K.22 31.4 32.59 32.56 32.72

K.23 35.9 37.9 37.98 37.97

K.24 27.6 28.62 28.57 28.82

Baboon 24.9 25.46 25.46 25.77

Barbara 28.0 28.66 28.55 29.14

Foreman 34.1 33.78 33.75 33.97

Face 34.8 35.56 35.54 35.64

Lena 34.7 36.23 36.23 36.29

Man 29.2 30.51 30.48 30.69

Zebra 30.6 33.21 33.12 33.35

Zone-

plate 12.7 13.27 13.2 13.97

Elaine 31.1 31.31 31.31 31.32

Average 30.29 31.59 31.56 31.77

1720

2014 IEEE 22nd Signal Processing and Communications Applications Conference (SIU 2014)

Figure 3: Visual comparison of Barbara and Zone-plate, from left to right correspond to insets of: original, Bicubic, baseline method [6], proposed method with correct model selection.

4.2 Qualitative Experimentation

Figure 3 presents original scenes of the Barbara and Zone-Plate

benchmark images as compared to their reconstructions obtained with

bicubic interpolation, the baseline algorithm and the proposed

algorithm with correct model selection, respectively. Selected zoomed

scenes are particularly shown to clarify the comparison. For both

image reconstruction examples, it can be observed that the

reconstruction of bicubic interpolation is blurry. Blur is less as sever in

the reconstruction of the baseline algorithm. However, the proposed

algorithm is better able to reconstruct sharper edges. Specifically,

directions are more correctly reconstructed with the proposed

algorithm. These visual results are in line with the PSNR results

presented in Table 1.

5. CONCLUSION

A single-image super-resolution algorithm is presented in this paper.

This algorithm is based on sparse representation over directionally-

structured dictionaries. The key idea is to classify image patches based

on their directional structures, and selectively code each patch using

the most appropriate dictionary. In total, eight directional and one non-

directional dictionary pairs are used. The K-SVD algorithm is used

herein for dictionary learning. Sparse representation error is the

criterion used for this classification. The effect of dictionary

redundancy is empirically studied and it is found that a patch of size

3×3 and dictionary size of 9x130 is a good compromise between

representation quality and computational complexity.

Experimental results indicate the usefulness of the proposed

selective sparse coding over directionally-structured dictionaries. As

compared to bicubic interpolation, the proposed algorithm has an

average PSNR improvement of 1.3 dB as tested over the Kodak set

and some benchmark images. However, it produces comparable

results to the baseline algorithm of Zeyde et al.[6], when only

information from the LR image patches are used for the classification

process. When the classification infers some information from the true

HR image patches, the proposed algorithm out-performs the baseline

algorithm with an average PSNR improvement of 0.2 dB. Visual tests

verify this quantitative result. This result suggests the need for a more

carefully designed sparse model selection for the testing phase.

6. REFRENCES

[1] M.Elad, I.Yavneh, "A Plurality of Sparse Representation is

Better than the Sparsest One Alone," IEEE Transaction on Information Theory, vol. 55, pp. 4701-4714, 2009.

[2] M. Elad, M. Aharon, "Image denoising via sparse and

redundant representations over learned dictionaries," IEEE Transactions On Image Processing, vol. 15, pp. 3736-3745,

2006.

[3] D. L. Donoho, "Compressed sensing," IEEE Transactions on Information Theory, vol. 52, pp. 1289-1306, 2006.

[4] J. Yang, J. Wright, T. Huang and Yi. Ma, "Image super-

resolution as sparse representation of raw image patches," in

IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, 2008, pp. 1 - 8.

[5] J. Yang, J. Wright, T. Huang and Yi Ma, "Image super-

resolution via sparse representation," IEEE Transactions on Image Processing, vol. 19, pp. 2861-2873, 2010.

[6] R. Zeyde, M. Elad and M. Protter, "On single image scale up

using sparse representation," in the 7th International conference on Curves and Surfaces, Avignon, France, 2012,

pp. 711-730.

[7] M. Aharon, M. Elad and A. Bruckstein, "K-SVD: an algorithm

for designing overcomplete dictionaries for sparse

representation," IEEE Transactions on Signal Processing, vol.

54, pp. 4311-4322, 2006.

[8] Y. C. Pati, R. Rezaiifar and P. S. Krishnaprasad, "Orthogonal

matching pursuit: Recursive function approximation with

applications to wavelet," in Conference on Signals, Systems and computers Asilomar, Pacific Grove, CA, 1993.

1721

2014 IEEE 22nd Signal Processing and Communications Applications Conference (SIU 2014)