gaussian foveation - autenticação · os meus artistas preferidos, que s˜ao os meus mais curiosos...

66
Gaussian Foveation Daniela Filipa Gonc ¸alves Pamplona Dissertac ¸˜ ao para obtenc ¸˜ ao do grau de Mestre em Matem´ atica e Aplicac ¸˜ oes Juri Presidente: Prof a Ana Bela Cruzeiro Orientador: Prof. Alexandre Jos´ e Malheiro Bernardino Prof. Carlos Jos´ e Santos Alves Vogal: Ana Leonor Silvestre Setembro 2008

Upload: others

Post on 19-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Gaussian Foveation

Daniela Filipa Goncalves Pamplona

Dissertacao para obtencao do grau de Mestre emMatematica e Aplicacoes

Juri

Presidente: Profa Ana Bela CruzeiroOrientador: Prof. Alexandre Jose Malheiro Bernardino

Prof. Carlos Jose Santos AlvesVogal: Ana Leonor Silvestre

Setembro 2008

Page 2: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a
Page 3: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Contents

1 Introduction 3

2 Operators and Matrices 72.1 Operators on Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . 72.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Functions and (Cartesian) Images 193.1 Sampling and Reconstruction . . . . . . . . . . . . . . . . . . . . . . 203.2 Image Geometrical Transforms . . . . . . . . . . . . . . . . . . . . . 223.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Foveation with Super Pixel Methods 254.1 Human Eye Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 264.2 From Cartesian to Log-polar Images . . . . . . . . . . . . . . . . . . 274.3 Image Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5 Gaussian Foveation 375.1 Ganglion cells on the Human eye . . . . . . . . . . . . . . . . . . . . 385.2 Gaussian Foveation . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.3 Image Transforms with Gaussian Foveation . . . . . . . . . . . . . . 435.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6 Super Pixel vs Gaussian Foveation 476.1 Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486.2 Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

7 Conclusions 55

i

Page 4: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

ii

Page 5: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Agradecimentos

O primeiro agradecimento e para quem mais contribuiu para este trabalho: o ProfessorAlexandre Bernardino, a quem agradeco toda a disponibilidade, competencia, clar-ividencia, e alem disso, paciencia para me orientar neste trabalho. A sua visao sobre oque e (tentar) fazer ciencia, foram cruciais para o meu desenvolvimento (tanto pessoalcomo profissional) e impossıveis de esquecer.

Seguidamente, agradeco ao Professor Carlos Alves, pelo empenho, o rigor e todasas contribuicoes uteis e estimulantes que me guiaram durante todo o trabalho.

Ao pessoal todo do Vislab agradeco todo o apoio e animo, as reunioes de Terca-feira e as conversas patetas sobre coisas patetas...

Naturalmente, nao posso deixar de me dirigir a todos os “amiguinhos pequeninos”do Tecnico que sempre me ajudaram a ultrapassar os “pontapes que a vida nos da” (porordem aleatoria): o Ze Nuno, o Lacerda, a Joana, o Rocha, a Meggy, o Ben, o Mini, aSara, o Renato, o Dudu, a Bibi, a Iara, o Hugo, o Andre Paulo...

Os meus artistas preferidos, que sao os meus mais curiosos fas: Xeque e Marta,obrigada por todas as vezes que me questionaram sobre a tese (e me fizeram repensartudo de novo).

Ich wuerde gerne allen meinen Freunden aus Deutschland danken. Ihr seit nunueber den ganzen Erdball verstreut aber doch immer bei mir. Johannes, Laundry, Ilze,Mary, Francesco, Daniel, Hendrik, Funda, Davide, Miguel, Alice, Alcindo, Mariya,Marcelo, Saku, Nazli... und mein deutscher Bruder - Mamoruuuuuuuuuuuu!!!

Por fim, quero agradecer a minha famılia, particularmente, as mulheres que meinspiram a enfrentar os problemas: a minha mae e a minha avo.

To my love, I just don’t know how to thanks...

1

Page 6: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

2

Page 7: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Chapter 1

Introduction

The sight sense is one of the most important for the Human beings. In fact, the visualperception is so crucial for the human that visual cortex occupies about one third of thesurface of the cerebral cortex. The aim of humanoid robots is to interact with humanbeings, since retina’s design is adapted to lifestyle of the animals and human vision isso capable, biological inspired models could be implemented in these robots [7].

The visual information is absorbed on the eye by the photoreceptors on the retina,then is sent to the primary visual cortex (Brodmann area 17) where begins its pro-cessing [6]. But the cortical image is not exactly equal to the real world one: on onehand, in the real world the resolution is constant, on the other hand, cortical image hasmore information on the center and less on periphery, providing data compression, butkeeping the central information. Visual perception in biological systems is often char-acterized by space-variant acquisition and processing mechanisms, in order to reducethe amount of instantaneous information to process and allow fast reactions to externalevents.

Figure 1.1: Brodmann areas on human brain

3

Page 8: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Figure 1.2: World image, retinal image and cortical image

The process that transforms the usual images with constant resolution on imageswith high resolution on the center decreasing radially (as the human cortical image) iscalled foveation. On the last years several foveation methods were presented, most ofthem based on Super Pixel approaches [14]. This class is composed by all the methodsthat remake a new partition of the image on super pixels. This partition is biologicallyinspired on the distribution of the photoreceptors responsible for the fine details detec-tion: the cones. The super pixel value is just an average of the corresponding cartesianvalue providing this way a foveated image. These super pixel methods have severalbenefits; in fact, they represent the eccentric human vision: the image has more infor-mation on the center than in periphery providing the desirable effect on the foveatedimages. However, its biological inspiration can be improved.

In this thesis a method (called Gaussian Foveation) is proposed to simulate andprocess cortical images. This method adds to the Super Pixel methods the informationabout the receptive fields of a specific neural cell on the eye: the ganglion cells. Thesecells receive information from several photoreceptors and send it to the brain. It is overthis information that the cortical image is formed. The aim of Gaussian Foveation isto model foveated images as the response of some kernels. The kernels represent theshape and sensibility of ganglion cells. The foveated images are generated by sam-pling the information on the retina with receptive fields of Gaussian shapes. It willbe presented a method for the inverse mapping (image reconstruction) and image geo-metrical transformations (translations, rotations, scales) derived with the support of theOperator Theory, and implemented with simple matrix operations. We show, through

4

Page 9: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

experimental evaluations, the performance of the proposed methods in comparison withclassical ones, in terms of image reconstruction and tracking errors. In general, the pro-posed method improves the results and, more importantly, it provides a more elegantformulation of the foveation process that, we believe, will prove beneficial in manyother applications

This thesis has 7 chapters organized as follows. The first and last chapters intro-duce and conclude, respectively, the work presented in the thesis. The second chapterpresents results from Operator theory and Matrices. On the third chapter is introducedbasics of images. The fourth and fifth chapters are devoted to foveation approaches: thesuper pixel and Gaussian foveation, which are compared on the sixth chapter, preciselyon the quality of the reconstructed image and on the tracking problem.

5

Page 10: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

6

Page 11: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Chapter 2

Operators and Matrices

The aim of this chapter is to recall some basic results on operators and matrices thatwill be relevant to this work. In a first section we consider operators defined on Hilbertspaces, and most of these results can be found in elementary books on functional anal-ysis or numerical analysis [1]. An operator will be seen as a map between HilbertSpaces and that the simplicity of this definition does not lies on empty results. In fact,operators have very important properties that will generate beautiful results used on thepresent work. In the last decades, Hilbert Spaces have been used as a crucial tool onseveral branches of engineering.

2.1 Operators on Hilbert spaces

We start by recalling the notion of Hilbert space.

Consider V a vectorial space over a field K = R or C, with an inner product:

〈·, ·〉 : V × V −→ K,

this defines a pre-Hilbert space. The inner product defines a topology induced by theassociated norm

|| · || : V −→ [0,∞[

||v|| =√〈v, v〉,

with a distance given by d(x, y) = ||x − y|| , and neighborhoods defined accordingly. Thenorm map allows to import topological notions from the real numbers. This topologyallows to easily extend the notions of Cauchy sequences and convergence. For instance

7

Page 12: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

(xn) converges to x ∈ V if

||x − xn|| → 0, (n→ ∞).

A Hilbert space is just a pre-Hilbert space where we have completeness, meaningthat every Cauchy sequence is convergent.

It should be noticed that if V is a pre-Hilbert space, its closure cl(V) = H is anHilbert space. The closure allows to define Cauchy sequences as convergent, in a sim-ilar process used in the construction of real numbers as Cauchy sequences of rationalnumbers.

We will be making use of two classical Hilbert spaces Rn and L2(X), correspondingto discrete and continuous settings.

– Discrete setting, we consider l2({x1, · · · , xn}), a space of discrete functions definedon collocation points {x1, · · · , xn}, which is isomorph to Rn, with inner product andnorm given by

〈 f , g〉l2 =

n∑

i=1

f (xi)g(xi), and || f ||2l2 =

n∑

i=1

f (xi)2.

– Continuous setting, we consider L2(X), the space of square Lebesgue-integrablefunctions on a bounded set X, with inner product and norm given by

〈 f , g〉L2 =

Xf (x)g(x)dx, and || f ||2L2 =

Xf (x)2dx.

(Note that this last setting reduces to the discrete setting if we consider a discrete

set X with the counting measure.)

Remark Functional spaces of standard continuous or differentiable functions like Cm(X)are only complete with the usual maximum (or supremum) norms, but these norms arenot induced by inner products. For that functional spaces instead of the Hilbert spacetheory, one should consider Banach space theory, where the role of the inner product isgiven in terms of duality [1]. Hilbert spaces are just a particular case of Banach spaces,where beside a norm a inner product is also available.

From the definition of inner product, it is possible to define the concept of orthog-

onality.

Definition Orthogonality

If v,w are two vectors of a Hilbert Space H, v is orthogonal to w (and write v ⊥ w) if:

< v,w >= 0.

8

Page 13: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

If B is a subspace of H, the orthogonal subspace is

B⊥ = {v ∈ H : v ⊥ B} .

Given a subspace B, we have the direct sum H = B⊕ B⊥ [13], but it is important tonotice that

(B⊥)⊥ = cl(B), (2.1)

and not B, unlike the finite dimensional case, where cl(B) = B. For instance, considerH to be a Hilbert space with an infinite basis defined by the set B = {b1, b2, · · · }, thenB⊥ = {0}, because

v ∈ B⊥ ⇒ v ⊥ {b1, b2, · · · } ⇒ v = 0,

this means (B⊥)⊥ = {0}⊥ = H. The result holds because H = cl(B), since B is a basis.

Bounded and Compact Operators

Operators in Hilbert spaces may be regarded as maps from one Hilbert space to another.In the simplest situation, we consider linear operators, noticing that most notions tryto generalize concepts from matrices, that are operators in finite dimensional Hilbertspaces.

Definition Bounded Linear Operator

For any H1 and H2 Hilbert spaces, A is a bounded linear operator if it is linear

A(αx + βy) = αA(x) + βA(y), ∀x, y ∈ H1,∀α, β ∈ K,

and if its norm is bounded

||A||L(H1,H2) = sup||x||161

||A(x)||2 < ∞.

here L(H1,H2) denotes the set of all linear bounded operators that map H1 into H2. IfH1 = H2 = H, we just write L(H).

If H1 and H2 are finite dimensional spaces then linear operators may be representedby matrices, using the basis, and in this finite dimensional situation they are boundedby a norm of the matrix.

Bounded linear operators are continuous, meaning that for every sequence (xn)

xn → x⇒ A(xn)→ A(x),

9

Page 14: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

which is straightforward to deduce since

||A(xn) − A(x)||2 = ||A(xn − x)||2 ≤ ||A||L(H1,H2) ||xn − x||1 → 0.

It is also true that continuity implies boundedness, and moreover since the operator islinear, by shifting the origin, one just needs to check continuity at zero, or any otherpoint. This is not so surprising, as this is shared by linear operators in finite dimensionalspaces. For non linear operators, like for non linear real functions, this property is nolonger true.

In finite dimensional situations, bounded sequences imply the existence of sub-sequences that are convergent, and bounded continuous functions keep this property.However, in infinite dimensions this is no longer true. The sequence defined by aninfinite basis is bounded but does not allow convergent subsequences.

This fact leads to the notion of compacity and to the notion of compact operator,

Definition Sequentially compact operator

A linear operator A is (sequentially) compact, if for any bounded sequence (xn) themapped sequence A(xn) has always convergent subsequences.

A compact operator is continuous, but the converse may not be true.Compact operators play an important role in operator theory, since they are the

operators in infinite dimensions that share more properties with matrices (which arelinear operators in finite dimensional spaces). The spectrum of a compact operatormay be infinite, but it is discrete, bounded, and has at most one accumulation point (atzero).

This implies serious limitations on the inversion of compact operators.

Range, Kernel, and Inversion

Associated with any operator A ∈ L(H1,H2) there are subspaces of H1 and H2, theKernel,

Ker(A) = {x ∈ H1 : A(x) = 0},

and the Range,R(A) = {y ∈ H2 : ∃x ∈ H1, A(x) = y}.

This notions are directly related to injectivity, meaning Ker(A) = {0}, and to surjectiv-ity, meaning R(A) = H2, and therefore to invertibility of an operator.

Definition Inverse Operator

An operator A between H1 and H2 is called invertible if there is an operator A−1 fromH2 to H1 such that:

(i) A−1(A(x)) = x, ∀x ∈ H1

10

Page 15: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

(ii) A(A−1(y)) = y, ∀y ∈ H2

If A−1 only verifies one of the conditions, it is said that A is left invertible and rightinvertible, respectively.

Example 2.1 If L2(R) is the Hilbert Space of functions square-integrable, the FourierTransform:

F : L2(R) −→ L2(R)

F f (s) =

∫ ∞

−∞f (t)e−2πist, dt

is an invertible operator and its inverse is given by

F −1 : L2(R) −→ L2(R)

F −1F(t) = (2π)−1∫ ∞

−∞F(s)e2πistds.

Invertibility of an operator may be easily checked for operators perturbed by acontraction, meaning an operator that has norm smaller than one. In this case we havean expansion that is called the Neumann series [1].

Lemma 2.2 ([4])(Neumann series) Suppose A : H −→ H is linear and bounded, with ||A|| < 1.

Then I − A is invertible, and for every y ∈ H,

(I − A)−1 =

∞∑

K=0

Aky.

Moreover,

||(I − A)−1|| 6 11 − ||A|| .

There is a similar result when A is a compact operator, since we may guarantee thatfor I − A either invertibility holds or the Kernel is finite dimensional, and this is just aparticular case of the Riesz theory.

Another important result due to Riesz, is the following representation theorem onlinear functionals (recall that a functional is an operator where H2 is now R or C).

Lemma 2.3 ([4]) (Riesz representation theorem).

If f is a bounded linear functional on a Hilbert Space H, then there exists a unique

y ∈ H such that for all x ∈ H,

f (x) = 〈x, y〉 .

Moreover, it is an isometry || f || = ||y||.

11

Page 16: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Adjoint operators

Definition Adjoint operator

Consider A an operator that maps H1 into H2. The operator A∗ is the adjoint operatorof A if

∀x ∈ H1,∀y ∈ H2 〈A(x), y〉 = 〈x, A∗(y)〉 .

Now, the scope of the adjoint operator should be clear. If it is hard to calculate< A(x), y >, and there is one adjoint operator of A, one can simply apply it on y.But this leaves us on the question, which operators have an adjoint? Is it simple toidentify them?

Theorem 2.4 ([4]) Let A be an operator from H1 into H2. If A is linear and bounded,

then there exists A∗, the adjoint operator of A. Moreover, A∗ ∈ L(H2,H1).

Proof Suppose A ∈ L(H1,H2). Than, for each y ∈ H2, the functional fy(x) =<

A(x), y > is a bounded linear functional. Hence, from Lemma 2.3, we have that thereis a unique y∗ ∈ H2, such that ,∀x ∈ H1,

< Ax, y >= fy(x) =< x, y∗ >=< x, A∗(y) >,

where A∗ is the adjoint operator of A. Let us now verify that A∗ is linear and bounded.

< A(x), αy + βz > = α < A(x), y > +β < A(x), z >

= < x, αA∗(y) + βA∗(z) >

On the other hand,

||A|| = sup||x||=||y||=1

< A(x), y >= sup||x||=||y||=1

| < x, A∗(y) > | = ||A∗||

It is clear that 0 = 0∗ and I = I∗, and if A is a matrix operator, i.e., an operator thatcan be described as a matrix M, than A∗ is also an matrix operator and it is describedby M

>, the conjugate transpose of M.

Also, we have the following identities

(A + B)∗ = A∗ + B∗, (αA)∗ = αA, (A∗)∗ = A, (DA)∗ = A∗D∗.

An operator is called auto-adjoint whenever A = A∗, this corresponds to the notionof Hermitian matrices (symmetric matrices, if the entries are real).

There are some relations between the operators, adjoint operators, spaces and dualspaces that can be helpful:

Theorem 2.5 [13]

Let H1 and H2 be Hilbert Spaces, and S ∈ L(H1,H2).

12

Page 17: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

1. Ker(S ) = R(S ∗)⊥

2. cl(R(S )) = Ker(S ∗)⊥

Proof 1. Let x ∈ Ker(S ) and y ∈ H2.

0 =< S (x), y >=< x, S ∗(y) >

which means Ker(S ) ⊆ Im(S ∗)⊥ On the other hand, if x ∈ R(S ∗)⊥,∀y ∈ Y wehave

0 =< x, S ∗(y) >=< S (x), y >

so, particularly < S (x), S (x) >= ||S (x)||2 = 0, which means S (x) = 0, i.e.,x ∈ Ker(S ).

2. we just need to see that (S ∗)∗ = S and take the orthogonal complements, noticingthat

(R(S )⊥

)⊥= cl(R(S )).

This is an important result. Suppose that we have an operator such that its adjointis injective, then the closure of its range is dense in the Hilbert space H.

For instance, suppose that S : L2(X1) → L2(X2) is given in the integral form, as aconvolution,

S ( f )(x) =

X1

G(x − y) f (y)dy

then its adjoint with respect to L2 is given by

S ∗( f )(y) =

X2

G(x − y) f (x)dx

and if one proves its injectivity then we would be able to write any L2(X2) functionwith a sequence of S ( fn).

As an example, related to our work, we may consider to check the integral kernelG given by a gaussian,

G(x) = exp(− |x|2 /σ2).

It should be noticed that these integral kernels lead to compact operators – and evenif they are invertible, for smoother kernels G the sequence of eigenvalues convergesmore rapidily to zero, compromising invertibility.

In the following we will consider discrete versions of these type of integral opera-tors, for the Gaussian approximation, in the sense that

X1

G(x − y) f (y)dy ≈m∑

j=1

w j f jG(x − y j)

13

Page 18: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Figure 2.1: Projection on the column space of a 3 × 2 matrix

where w j are quadrature weights, that may be inserted together with f j in a singlecoefficient.

While trying to approximate a function with these linear combinations we are leadto a system that will inherit the ill conditioning problems associated to the fact that S isa compact operator. Therefore we must consider appropriate techniques to circumventthis conditioning issues, even in the context of matrices – in the following section weconsider the pseudo-inverse, singular value decomposition and Tikhonov regularizationtechniques.

2.2 Matrices

Suppose that A is one matrix with dimension m × n (with m > n), and b a m × 1 vector.If m = n and the columns of A are linear independent, A is invertible and the equation

Ax = b (2.2)

has one unique solution x = A−1b.Otherwise, this system is inconsistent, this means,there is no solution for the equation 2.2.

Since there is no solution to this problem, it is desirable to find the x, such Ax iscloser to b, i.e., the solution that minimizes ||Ax− b|| (in the euclidian norm). Geomet-rically, the point b = Ax should be the projection of b on the column space of A, andthe error ||b − b|| must be orthogonal to that space.

Before looking for the desirable solution, we introduce one matrix transformation,the Moore-Penrose pseudo-inverse, which is the usual solution in the least squaressense.

14

Page 19: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Definition Pseudo inverse

If M is a matrix with dimensions m × n, the pseudo-inverse matrix of M, denoted byM+ is a matrix n × m such that:

1. MM+M = M

2. M+MM+ = M+

3. M+M and MM+ are both Hermitian.

Corollary 2.6

1. If the columns of M are linear independent, M+ = (M∗M)−1M∗.

2. If the rows of M are linear independent, M+ = M∗(MM∗)−1.

3. If both columns and rows are linear independent, M is invertible and M+ = M−1.

This means that, when M has rank n, then M+ is the left inverse of M, and whenthe rank is m, then M+ is the right inverse.

Proposition 2.7 [12]

If M is one matrix, and M+ its pseudo-inverse, then:

1. The column space of M+ is the row space of M and vice-versa.

2. (M+)+ = M.

3. rank(M+)=rank(M).

4. (M∗)+ = (M+)∗

The next theorem is a version of the well known result in least squares approxima-tion [1], given in the the context of the pseudo-inverse [12].

Lemma 2.8 [12]

If Ax = b is one inconsistent linear problem, where M is a matrix with dimension (m, n)with m > n, the least squares solution is x = M+b, where M+ is the Moore-Penrose

pseudo-inverse of M.

This means that if we have an inconsistent linear problem Mx = b, it is very simpleto find the best solution (on the least squares meaning). We just have to calculate thepseudo-inverse of the matrix M

15

Page 20: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Pseudo-inverse and Singular Value Decomposition

The singular value decomposition states that any (m× n) matrix N can be factored intothree matrix such as:

M = Q1ΣQ∗2

where Q1 is an (m × m) orthogonal matrix, Q2 is an (n × n) orthogonal matrix (also)and Σ is a (m× n) matrix essentially diagonal (meaning that it is zero everywhere, withpossible exceptions at the Σii entries). These entries Σii are called the singular valuesof M [11], which are the square root of the eigenvalues of M∗M.

Theorem 2.9 [12]

Given any matrix M, the pseudo-inverse M+ is given by:

M+ = Q2Σ+Q∗1

where Σ+ is a (n × m) matrix, essentially diagonal, with Σ+ii = 1/Σii,Σii , 0

Proof Remember that M+ minimizes the squared error of some system Mx = b.

||Mx − b|| = ||Q1ΣQ∗2x − b|| = ||ΣQ∗2x − Q1b||

Let y = Q∗2x = Q−12 x, which has the same length as x. Then, we want to minimize

||Σy − Q1b||, and the optimal solution is y = Σ+Q∗1b. Therefore

x = Q2y = Q2Σ+Q∗1b

M+ = Q2Σ+Q∗1.

Tikhonov Regularization

When the matrix M is ill conditioned (singular values are close to zero), and even ifthe inverse exists, the solution of a system may be serioulsy perturbed by numericalrounding errors [1].

One way to circumvent this problem is to consider a perturbation of the leastsquares system, by adding some small constant term τ to the diagonal of M∗M, mean-ing that

(M∗M) x = M∗b leads to (τI + M∗M) xτ = M∗b.

It is clear that the solution xτ resumes to x when τ = 0. This is process is calledTikhonov Regularization. The Tikhonov pseudo-inverse is now given by

M+τ = (τI + M∗M)−1 M∗,

16

Page 21: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

for some chosen 0 ≤ τ << 1, and coincides with the classical Moore-Penrose pseudo-inverse when τ = 0.

There is a balance in this regularization technique – if the value τ is too small thenthe ill conditioning problems remain, while if τ is too large the error ||x − xτ|| may benot neglictible. There are some criteria for the choice of τ, such as the L-curve, theMorozov discrepancy principle, but these topics are not in the scope of this work.

17

Page 22: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

18

Page 23: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Chapter 3

Functions and (Cartesian)Images

The aim of this chapter is to introduce some basics concepts of image processing. Italso clarifies the differences between local and global approaches on image reconstruc-tion. Finally, it is described one usual method for image geometric transforms

The standard model of digital images is considering it as a vectorial function, froma rectangle of integers (M × N) into a rectangle on R3. The domain corresponds to thepixels values and the three color components (Red, Green and Blue).

This is the standard model for a digital image. It is applied on cameras, photos,pictures,... Sometimes, for simplicity reasons, or visual effects, the image range is notthe full color components, but only gray brightness intensity, reducing the range to ainterval.

Definition Cartesian image

c is a Cartesian image if c is a function such that:

c : {0, 1, ...,M − 1} × {0, 1, ...,N − 1} −→ {0, ..., 255}

This discrete setting allows us to represent the Cartesian image in the matrix form:

c(0, 0) c(0, 1) . . . c(0, n − 1)c(1, 0) c(1, 1) . . . c(1, n − 1)...

.... . .

...

c(m − 1, 0) c(m − 1, 1) . . . c(m − 1, n − 1)

Each element (i, j) ∈ M × N is called a pixel.

19

Page 24: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

For technical reasons the range is limited, and the value 256 = 28 is chosen becauseit corresponds to 1 byte for each pixel.

One of the most important relationships between pixels is the neighborhood rela-tion. It is the base for many methods and applications like interest point detection,edges detection, segmentation, object recognition etc. For instance, a simple methodfor detection of edges is to look on the image adjacent pixels with high difference ofbrightness.

Definition Cartesian neighbor

Let (i, j) be a cartesian pixel. The set of the neighbours of (i, j) is

N(i, j) = {(i′, j′) : (i′, j′) = (i + k, j + l); k, l = −1, 0, 1; (i′, j′) , (i, j)}

Figure 3.1: Cartesian neighborhood

A typical image has a considerable quantity of information (usually associated withsome noise), but part of it dispensable for the human perception. For instance, in figure3.2 there are no details (hands, face, hair, clothes,...) or color, but it is clear that thesecouple of lines and a circle represents a human being.

3.1 Sampling and Reconstruction

Today’s cameras have capacity of capture 10M Pixels. Since this amount of informa-tion is hard to process on real time, a sampling to the image should be made.

Figure 3.2: Stereotyped human

20

Page 25: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Once the initial image is sampled, the recover of the original image depends essen-tially on two parameters: the sampling points distribution and the approximation/interpolationmethods. For instance, if the sampling points are concentrated on one corner, then it ismore difficult to reconstruct on the opposite corner. But even with a suitable samplingset, if the interpolation/approximation method is not correct the reconstructed imagewill be distorted.

This work is focused on the approximation methods for a specific sampling distri-bution; therefore, it is constant along the work. The framework here presented doesnot dependent of the sampling, although some conditions (inspired biologically) areimposed.

The main methods of sampling-reconstruction can be divided into two classes: Lo-cal Approximations and Global Approximations. Each of them has their own benefitsand disadvantages. On the following sections, both approaches are presented.

Local Approximation

For reconstruction, one possibility is to consider a local interpolation/approximation.Here is exemplified one simple method that produces good results (locally). The mainidea is to subdivide the original matrix into smaller parts leading to a piecewise inter-polation/approximation, being

P = {0, · · · , M − 1} × {0, · · · ,N − 1}

the original domain, we define K parts Pk ⊂ P :

P = P1 ∪ · · · ∪ PK .

The original image c is then approximated by functions fk defined in each Pk, suchthat

c|Pk = fk.

This local approximation may be given by very simple linear spline interpolationon the sampling points.

This interpolation leads to an exact representation of the image on the samplingpoints, but since the degree of the polynomials is 1, the reconstructed image can havesome vertices that does not exists, and the variation on the image value is not smooth.

It is worth noting that piecewise polynomial interpolation C1 leads to rather com-plicated finite elements, such as Argyris elements, which are based on 5th degree poly-nomials with 21 degrees of freedom [3].

21

Page 26: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Global Approximation

An other possibility for approximation of the function c, is considering a global ap-proach, where the function c is described as linear combination of regular functions,meaning,

c(i, j) =

P∑

m=1

αmφm(i, j),

where φ1, . . . , φP are the basis functions.

But, for a good approximation we must guarantee that:

• The functions φm are regular [0,M − 1] × [0,N − 1],

• The number of functions P is enough

If P = N ∗ M, it is possible to reconstruct perfectly the cartesian image (but in thiscase there is no data reduction) with a linear system of N × M variables (which is verycostly for the usual image size).

Once the goal is to reduce the data, one possibility is to consider a smaller numberof functions P < M ∗ N and then solve the system on the least squares sense. Thismeans, recurring to an indexation (I(n), J(n)) for the cartesian coordinates, is defined amatrix φ where each matrix entry (x, y) such

φx,y = φx(I(y), J(y))

and considering cy = c(I(y), J(y)), the least squares solution is obtained solving thesystem

φ∗φ f = φ∗c

where f = ( f1, f2, ... fP) and φ∗ is the conjugate transpose of φ.

The usual method to solve this system is recurring to the Moore-Penrose pseudo-

inverse of φ. This method is not very robust when the matrix φ∗φ is ill conditioned. Onchapter 2 was introduced the Moore-Penrose pseudo-inverse, and where we presenteda very simple solution for ill conditioned matrices.

3.2 Image Geometrical Transforms

Sometimes, in computer vision it is useful to simulate image geometric transforms, i.e.,from an initial image approximate one transform in the real world. These methods areparticularly used on predicting motion or estimating motion between two sequentialvideo frames.

22

Page 27: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Independently of the transform (Euclidian, Affine, Projective,...) that is tried toapproximate, the method is always the same: first the coordinates are transformed,then image values are transferred, i.e., if (i′, j′) = T (i, j)1 then c(i′, j′) = c(i, j).

Figure 3.3: Rotation of one image

It is well known from the basics of geometry that simple transforms on the planeR2 are described by a matrix. For convenience, the homogeneous coordinates systemis used, meaning that a pixel (i, j) is represented by the vector [i j 1]T .

Consequently, for simulate on pixels a simple translation th, tv (horizontal and ver-tical respectively), the coordinates of each pixel (i′, j′) is gave by:

i′

j′

1

=

1 0 th0 1 tv0 0 1

i

j

1

On this thesis, we worked over similarity transforms, meaning that only transla-tions, rotations and scales are introduced.2

On this case, the matrix T is such us:

T =

s ∗ cos(α) s ∗ sin(α) th−s ∗ sin(α) s ∗ cos(α) tv

0 0 1

where s, α are the scale and rotation values.1Of course there are some boundary problems in this definition. T (i, ·) could be higher than M, or could

be i′ such @i : i′ = T (i, ·). On this work, when there is some kind of these problems, the image in that pixelis considered not defined

2Since one image is a 2 dimensional projection of the 3 dimensional real world, the pixel coordinates (x, y)are usually represented by its homogeneous coordinates (x, y, 1). This allows simulate projective transforms-8 degrees of freedom- on the image plane

23

Page 28: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

This result can be generalized to other transforms like euclidian, affine, and projec-tive [10].

3.3 Conclusions

On this chapter, it was introduced a formal description of a digital image (a simplevectorial function over a rectangle). Digital images representation has a lot of dataunnecessary for human perception, meaning that one can sample the image keepingthe relevant information.

About the sampling-reconstruction process was introduced the two main classes ofapproaches: local vs. global. Some simple examples were given, and probable issueswere presented.

It was also explained how to induce geometrical transforms on the image planeusing simple matrix operations.

24

Page 29: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Chapter 4

Foveation with Super PixelMethods

On this chapter, one of the most relevant super pixel methods is presented [14]. It willbe seen that this method produces the desirable image compressing, keeping the centralinformation, but its reconstruction of the original image is not very good, moreover theimage processing methods are difficult to implement.

In humanoid robots systems it is recommendable to have real-time interaction withenvironments, for that, image processing and interpretation should be precise and im-mediate. Despite of the last years software and hardware development, the usual carte-sian images are not enough: the processing algorithms are precise but slow.

On the other hand, humans relay mostly on visual information when they interactwith the environment, on real world this information is constantly changing and trans-forming. How is it possible that the human brain is able to deal with such an amount ofinformation? What image representations and processing models should be employedin humanoid-robots to operate on real environments?

One possible solution is make the robot work over foveated images, i.e., imageswith more pixels on the center than on the periphery. These images are used as repre-sentations of the cortical image. So, as well modeled, as close to the human vision weare.

Most of the foveation methods are in the class of the super pixel methods, i.e.,methods considering each foveated pixel as a set of cartesian pixels, where its value isan average of the corresponding cartesian pixels.

In this chapter, the first section is about the biological inspiration, followed bythe computational implementation of the foveation process, on the third section is ex-plained how to introduce image geometrical transforms, finally some conclusions arepresented.

25

Page 30: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

4.1 Human Eye Structure

The brain processing of the light signals starts on the eyes, where the photoreceptorsof retina absorb the information and pre-process it. The information leaves the eye bythe optic nerve sending it to the primary visual cortex, at the back of the brain.

On the primary visual cortex, the image is reconstructed, allowing the brain pro-cessing. However, the cortical image resolution is not regular over the field of view, toavoid redundant processing, and consequently, useless waste of energy, there is moreinformation over the central image than on the periphery.

This singular image on the brain is constructed due to the human eye structure,which can be divided into three main layers: the fibrous tunic, the vascular tunic, andthe nervous tunic.

The nervous tunic is the inner sensor which includes the retina, where thousandsof photosensitive receptors called rods and cones exists (and others in less quantity).

The rods photo-receptors are more sensitive to light changes, shape and fast move-ment, that is why they are more useful in low light vision, on the other hand, cones aresensible to color and better suited for detecting fine details.

Nevertheless, the concentration of the photoreceptors is not constant on retina. Onthe bottom of retina, there exists the fovea, an area with a high concentration of conesand empty of rods.

This irregular distribution of the photoreceptors provides higher resolution on thecenter, but less color sensibility on the periphery.

(a) Human eye (b) Rods and Cones distribution on retina

Since the cones are the photoreceptors responsible for small details and color vi-sion and it is desirable to model the robot vision as the human, we should redistributethe robots sensors on a grid, close to the cones distribution. This means, instead ofan invariant space geometry, to considered a “Log-polar model”. This model consistsin irregular sampling (more samples on the center decreasing logarithmically), and re-ducing the amount of data it provides a maximum resolution at image’s center, without

26

Page 31: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

losing information over the periphery.

4.2 From Cartesian to Log-polar Images

In the Cartesian images the sensors are distributed in a rectangular grid with M ×N sensors. The redistribution the sensors onto a logpolar shape is equivalent to onecoordinates transform.

Figure 4.1: Coordinates Distribution

The transform algorithm is based on considering each pixel (x, y) as a complexnumber z = x + iy, then calculate the complex point w = log(z) and normalize w to theinteger array indices for S and R.

However, when the modulus of the complex number z is zero, there is a singularityin log(z). To avoid this problem a w = log(z + α) mapping is applied.

This algorithm is a 3-step procedure, each one will be explained and analyzed inthe next sections.

From Cartesian to Log-polar Coordinates

According with the biological model, the focus point should be at the image center. Tohelp forward the function construction, one should translate the cartesian coordinatesinto centered coordinates.

The centered cartesian origin should be at matrix center(

m+12 , n+1

2

), and it is as-

sumed that the axes define 4 equal quadrants. For this reason, the centered coordinatesare defined over N′×M′ where N′ = {− n

2 , . . . , 0, . . . ,n2−1} and M′ = {−m

2 , . . . , 0, . . . ,m2 −

1} and the translation function from cartesian coordinates to centered coordinates is:

27

Page 32: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

T :M × N � C

T (i, j) = (n + 1

2− j,

m + 12− i)

From now on, the centered coordinates T (i, j) will be denoted as (x, y).

Constructed the centered coordinates, the logpolar coordinates should be 4-waysymmetric, once again, to help forward, the logpolar transformation construction willbe over the first quadrant,1 and then extended to the other quadrants.

The first quadrant functions will be denoted S Q and RQ, the respective extendedcalled S E and RE .

Assuming that M ≥ N, the minimum and maximum values of the log functionshould be in the origin (0, 0) and when the outermost ring intersects the y axis, respec-tively. so

umin = log∣∣∣∣∣α +

12,

12

∣∣∣∣∣

umax = log∣∣∣∣∣α +

12, | (m

2− 1

2|)∣∣∣∣∣

are defined as constants to normalize the RQ(x, y) value. 2

On the other hand, the minimum spoke angle occurs on the y axis, and it is givenby

vmin = arctan(α +

12, | (m

2− 1

2|))

and the maximum is Π.

To calculate the value RQ(x, y), first we need calculate

ρ = |x + α +12, (y +

12

)|

and then normalize.

RQ :Q � R

RQ(x, y) =b r2

log(ρ) − umin

umax − uminc

In the same way, it should be firstly calculated

θ = arctan(x + α +12, y +

12

)

1By definition of first Quadrant, it is the set Q = {(x, y) ∈ N′ × M′ : x >= 0 and y >= 0}2The value 1

2 is added to compute the function over the center of each pixel instead of the corner.

28

Page 33: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

and the resulting S Q function is

S Q :Q � S

S Q(x, y) =bs θ − θmin

Π − 2θminc

If RQ(x, y) > r2 , both the values RQ(x, y) and S Q(x, y) are not defined.

The extended functions S E and RE are defined as followed:

S E :N′ × M′ � S

S E(x, y) =

S Q(x, y) if x ≥ 0 and y ≥ 0

S Q(x,−y) if x ≥ 0 and y < 0

S Q(−x, y) if x < 0 and y ≥ 0

S Q(−x,−y) if x < 0 and y < 0

RE :N′ × M′ � R

RE(x, y) =

RQ(x, y) if x ≥ 0 and y ≥ 0

RQ(x,−y) if x ≥ 0 and y < 0

RQ(−x, y) if x < 0 and y ≥ 0

RQ(−x,−y) if x < 0 and y < 0

Considering the afore mentioned construction, one gets the result functions S (i, j)and R(i, j):

S :M × N � S

S (i, j) =

S E( n+12 − j, m+1

2 − i) if n+12 − j ≥ 0 and m+1

2 − i ≥ 0

s − 1 − S E( n+12 − j,−m+1

2 − i < 0) if n+12 − j ≥ 0 and m+1

2 − i < 0

S E(− n+12 − j, m+1

2 − i < 0) if n+12 − j < 0 and m+1

2 − i ≥ 0

s − 1 − S E(− n+12 − j,−m+1

2 − i < 0) if n+12 − j < 0 and m+1

2 − i < 0

29

Page 34: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

R :M × N � R

R(i, j) =

r2 − RE( n+1

2 − j, m+12 − i) − 1 if n+1

2 − j ≥ 0 and m+12 − i ≥ 0

r2 − RE( n+1

2 − j,−m+12 − i) − 1 if n+1

2 − j ≥ 0 and m+12 − i < 0

r2 + RE(− n+1

2 − j, m+12 − i) if n+1

2 − j < 0 and m+12 − i ≥ 0

r2 + RE(− n+1

2 − j,−m+12 − i) if n+1

2 − j < 0 and m+12 − i < 0

Figure 4.2: Log-polar pixels distribution over one image

From Cartesian to Log-polar Images

Once the logpolar coordinates are constructed, it is very easy to build the log polarimage. The main idea is that the brightness of each logpolar pixel f (u, v) should be theaverage of the values c(i, j) that S (i, j) = u and R(i, j) = v.

So, firstly the level set 1 fk (x, y) is defined

1fk : N × N � {0, 1}

1fk (x, y) =

1 if f (x, y) = k

0 otherwise

With this function, it is easy to generate an area function a that measures the number

30

Page 35: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

of cartesian pixels (i, j) which have logpolar coordinate (u, v)

a : M × N � N

a(u, v) =∑

i, j

1Su (i, j) ∗ 1R

v (i, j)

Finally, it is possible to define a logpolar image:

Definition Log-polar image

Let c be a cartesian image over M×N, S = {0, 1, . . . , s−1} and R = {0, 1, . . . , r−1}be subsets of N(called bases of logpolar coordinates). D ⊂ S × R, such us D = {(u, v) :(u, v) = (S (i, j),R(i, j)), for some (i, j) ∈ M × N}

One can say that f is the logpolar image of c if f is a function such that:

L : D � 0, ..., 255

f (u, v) =1

a(u, v)

i, j

c(i, j) ∗ 1Su (i, j) ∗ 1R

v (i, j)

D is called Log-polar domain, and each element (u, v) is called logpolar pixel.

Once the foveated image is constructed, the human brain works over it.

On the next section, image transforms over these images will be simulated. First, amethod for reconstruction of the cartesian image is introduced, followed by one repre-sentation of the logpolar coordinates (the connectivity graph).

Reconstruction

Since there is a reduction of the amount of information, it is impossible to reconstructperfectly the cartesian image from the logpolar one (this was already seen before onchapter 3). However, the center reconstruction should be very precise, since the size ofthe Log-polar pixels increase with the eccentricity, on the periphery the reconstructedimage should be more far from the original.

The reconstruction method is very simple: each cartesian pixel, take the value ofthe correspondent logpolar one.

Definition Inverse Log-polar image

If l is a logpolar image,

f −1 : C � 0, ..., 255

f −1(i, j) =

s−1,r−1∑

u=0,v=0

f (u, v) ∗ 1Su (i, j) ∗ 1R

v (i, j)

31

Page 36: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Note that this method is even more simple that the one presented on chapter 3 forlocal approximations. It corresponds to say that the functions fi j are not in Q1 but inQ0, i.e., they are polynomials of level 0 with coefficients in Q.

200 400 600 800 1000 1200

100

200

300

400

500

600

700

800

900

(a) Cartesian image

200 400 600 800 1000 1200

100

200

300

400

500

600

700

800

900

(b) Reconstructed image

Figure 4.3: Cartesian and reconstructed Log-polar image

Note on the data reduction: the cartesian image has dimensions M=964 and N=1286.Meaning that there are 1239704 cartesian pixels. On the other hand, the logpolar im-age has S=128 and R=64, so there is less than 8192 logpolar pixels. Note also, that themain information (two people on a bicycle) is not lost, but the background people arenot perceptible.

The transformation of a cartesian image onto a logpolar one depends on three pa-rameters: α, s and r. The variation of these parameters produces expected result, i.e.,when s and r increase, the reconstructed logpolar image is closer the cartesian.

20 40 60 80 100 120 140 160 180 200

20

40

60

80

100

120

140

160

180

200

(a) Reconstruction of the logpolar image α = 1, s =

32, r = 16

20 40 60 80 100 120 140 160 180 200

20

40

60

80

100

120

140

160

180

200

(b) Reconstruction of the logpolar image α = 4, s =

128, r = 64

Figure 4.4: Resulting reconstructed images with different parameters

Now, the effect of the foveated images is clear: the central information is main-tained, but the periphery is blurred. The data reduction is hight.

32

Page 37: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Connectivity Graph

Many image processing algorithms are based in the neighborhood relationship betweenpixels: edge detection, interest points detection, template matching, etc. Therefore, thenext step of logpolar image processing should be define and analyze this relation. Onthe cartesian domain, this relationship is easy to identify and describe. However, on thelogpolar domain, this relation is not so simple, over the pixel distribution of the figure4.2, we can visualize three different situations:

1. N(8,18) = {(8, 17) ; (9, 17) ; (9, 18) ; (9, 19) ; (8, 19) ; (7, 19) ; (7, 18) ; (7, 17)}

2. N(15,1) = {(15, 2) ; (14, 2) ; (14, 1) ; (14, 0) ; (15, 0) ; (15, 19) ; (15, 18) ; (15, 17)}

3. N(13,17) = {(14, 17) ; (14, 18) ; (13, 18) ; (12, 8) ; (12, 17) ; (13, 16) ; (14, 16)}

On the first case, the neighbors of the pixel (8, 18) are the usual one, but if we lookcloser, we verify that on the second case, (15, 19) is neighbor of (15, 1), which is nota trivial relationship, and on the third case, there is not even 8 neighbors of the pixel(13, 17).

So, to define this relation we have to be careful, specifically on the image’s merid-

ian, we must be more precise.

Definition Meridian

Let (u, v) be a logpolar pixel. One can say that (u, v) is over the meridian if

∃(i, j),yS (i, j) = u, R(i, j) = v and T (i, j) = (0, y) or T (i, j) = (−1, y)

Where T (i, j) is the centered coordinates of the pixel (i, j)

Definition Log-polar neighbour

Let (u, v) be a logpolar pixel.If (u, v) is a meridian pixel (u′, v′) is neighbor of (u, v) if

∃i, j,i′, j′ such that (i, j) is a neighbour of (i′, j′)and

S (i, j) = u,R(i, j) = v and

S (i′, j′) = u′,R(i′, j′) = v′

If (u, v) is not a meridian pixel, than (u′, v′) is neighbor of (u, v) if

∃k,l=−1,0,1(u′, v′) = (u + k, v + l); (u′, v′) , (u, v)

In the logpolar images, the neighborhood relationship is difficult to visualize, andrepresent. Algorithms based on this pixel relation should be adapt to this new paradigm.

33

Page 38: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

One clear way to represent this relation is to build a structure known as connectivitygraph (CG). In this graph, each vertex represent a logpolar pixel, and there is an edgebetween two vertices if the respective logpolar pixels are neighbor.

Definition Connectivity Graph

Let D be a logpolar domain over S ×R. A Connectivity Graph CG = (V, E) is a pairsuch that: V = D and E ⊂ D×D, // where E = {((u, v), (u′, v′)) : (u′, v′) is a logpolar neighbour of the pixel (u, v)}

Figure 4.5: Connectivity graph of a logpolar image

This Connectivity Graph allows image processing operations such as translation,rotation and scaling in a simple way. As an example, the translation will be studied.

4.3 Image Transforms

On the cartesian images possible to introduce geometrical transforms: translation, ro-tation, and scale, with a very simple method. Log-polar images are rotation and scaleinvariant, but how to translate one logpolar image?

In this section,it is introduced a method for simulate translations on logpolar im-ages. The main idea is based on the connectivity graph of one logpolar image (for nowon, denoted as reference image), construct a translation graph.

Definition Translation Graph

Let CG = (V, E) be a connectivity graph over a logpolar domain D. A TranslationGraph GT = (VT , ET ) related with the translation T = (∆i,∆ j) is a graph such that:VT = V and ET ⊂ D × D, where

ET = {((u, v), (u′, v′)) : ∃(i, j) 1Su (i, j)∗1R

v (i, j)∗1Su′ (i−∆i, j−∆ j)∗1R

v′ (i−∆i, j−∆ j) = 1}

34

Page 39: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

This new graph has a completely different meaning compared to the connectivitygraph: instead of neighborhood relations, each edge represents the cartesian pixelstranslation over the logpolar coordinates.

Associated with each edge ET ((u, v), (u′, v′)), one should measure the number ofcartesian pixels, which translate from (u, v) to (u′, v′). This value (KT ) will be importantto calculate the translated logpolar pixel.

KT (((u, v), (u′, v′))) =∑

i, j

1Su (i, j) ∗ 1R

v (i, j) ∗ 1Su′(i − ∆i, j − ∆ j) ∗ 1R

v′(i − ∆i, j − ∆ j)

Finally, the translated logpolar image is obtained with:

LT :D � Q

LT (u, v) =1∑

KT ((u, v), (u′, v′))

∑(KT ((u, v), (u′, v′)) ∗ L(u, v))

Note the similarity between the functions L and LT : both are weighted means overthe pixel’s brightness: the first one is constructed using the cartesian pixels, the secondone uses the logpolar pixels before the translation.

This shows the importance of the connectivity graph and how it could be useful inlogpolar images.

cartesian trans

20 40 60 80 100 120 140 160 180 200

20

40

60

80

100

120

140

160

180

200

(a) Translated image on the cartesian do-main and then foveated

conectivity graph

20 40 60 80 100 120 140 160 180 200

20

40

60

80

100

120

140

160

180

200

(b) Estimate translated image using theconnectivity graph

Figure 4.6: Cartesian Translation vs. Super Pixel translation

4.4 Conclusions

In this chapter was analyzed one super pixel method for foveation over several param-eters, patches, and conditions. It always produced the pretended visual transform anddata compression.

However, the super pixel image transform is difficult to describe and implement;

35

Page 40: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

and it is not very stable over parameters (α, s and r). Recurring to a connectivity graph,the pixel neighborhood relationship can be represented and simplified, and it is fromthat graph that is possible to simulate translations on the logpolar domain.

The reconstruction method is local and quite basic: it results from interpolating thelogpolar image with polynomials of degree 0. Naturally, this very simple local methodcan be improved with linear cost.

36

Page 41: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Chapter 5

Gaussian Foveation

On this chapter is presented a new approach for a foveation method. This frameworkimproves the biological inspiration and its implementation is more simple and elegantthan Super Pixel approaches.

The usual Super Pixel methods have some problems resulting on the oversamplingon the cartesian image. The pixel-neighborhood relations are not simple to identify, andthe usual geometrical transforms are difficult to implement. Moreover, the biologicalinspiration can be improved.

Gaussian Foveation approach concentrates on specific neural cells on the humanfovea, the ganglion cells. One ganglion cell receives information from the photorecep-tors (cones and rods) and sends it to the visual cortex.

The shape and sensitivity of the receptive field of the ganglion cells is not constanton retina, once again it depends on its eccentricity.

On [9] is presented a scale-space invariant approach for a formalization of receptivefields. This work suggests Gaussians functions for receptive fields’ representation,where mean determines its center and variance defines its support.

The aim of this framework is also to simplify calculus and representation of foveatedimages. Supported by some basics of functional analysis, it is provided an elegant rep-resentation of the cortical image and geometrical transformations on the image domain.Moreover, it is suggested an easy method to adapt the usual operations over cartesianimages (like bright variations, convolutions, etc).

This chapter is organized as follows: the first section is the biological inspiration,followed by the Gaussian Foveation method, image transforms, and finally some con-clusions are taken.

37

Page 42: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Figure 5.1: Ganglion Cells anatomy

5.1 Ganglion cells on the Human eye

On the last chapter, it was presented the human eye structure, specifically, we havedescribed the physiology of photoreceptors on the retina. The main photoreceptorson the retina are the cones and rods, each of them with specific functions on imageperception. Based on the foveal cones distribution on the retina was introduced onesuper pixel approach to simulate the human image acquisition.

Here it is introduced the ganglion cell concept: which is a neuron whose mainfunction is to receive the information from the photoreceptors and send it to the humanbrain.

Each ganglion cell receives information from a variable number of cones and rodsthat depends not only on the cones and rods distribution on the human eye, but alsowith its eccentricity in relation to fovea.

The receptive field of a sensory neuron is a region of space in which the presence ofa stimulus will alter the firing of that neuron. The receptive field of the ganglion cellsis not equal along the retina. In fact, it is also foveated: there are more receptive fieldson the fovea, but their scope is smaller. Beyond that, the shape of each receptive fieldis not straight: it is acute on the center and smoothly decreases to the periphery.

Resuming, there are some constrains on modeling the receptive fields [2].

• The diameter of the smallest receptive field is proportional to eccentricity;

• At any eccentricity all diameters greater than the corresponding smallest unit arepresent;

• Mean receptive field size increases linearly with eccentricity;

• The transformation from the visual field to the cortex is logarithmic and theVisual Cortex seems rather homogeneous;

38

Page 43: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Figure 5.2: Representation of receptive fields

• At any retinal location many receptive field sizes are present but smaller fieldsare located more centrally;

• The relative overlap of receptive fields is independent of eccentricity.

All this assumptions lay on the idea of modeling the receptive fields as Gaussians,foveated distributed, where the mean and variance define their position and shape.

5.2 Gaussian Foveation

In this framework is purposed a formal model describing the light signal acquisition andtransform with simplified representation of the human vision and foveated images. Onthe foveal space, geometrical transforms can be introduced easily. We just need applythe inverse geometric transform on the receptive fields. This idea can be generalizedfor any linear bounded transform. First, we must define the indexation method.

Definition Indexation

Let M be a matrix with dimension m × n.

Index :(m × n) � 1, ...,m · nIndex(i, j) =( j − 1) · m + i

(0, 0) (0, 1) . . . (0, n − 1)

(1, 0) (1, 1) . . . (1, n − 1)...

.... . .

...

(m − 1, 0) (m − 1, 1) . . . (m − 1, n − 1)

index

))

(0, 0)

(1, 0)...

(m − 1, 0)

(0, 1)

(1, 1)...

(m − 1, n − 1)

index−1nn

39

Page 44: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

The inverse function is defined as:

Definition Invert indexation

Index−1 :(m ∗ n × m) � (m × n)

Index−1(idx,m) =(b(idx − 1)/mc + 1, (idx − 1)mod m + 1)

Using this indexation method, we can transform the matrix representation of thecartesian image, build over a grid of size (M,N), onto a vector representation of length(M ∗ N, 1). This representation is given by

c(i, j) 7→ c(index(i, j)).

From now on, it is used c to represent one cartesian image, independent of therepresentation (matrix or vectorial).

Proposition 5.1Let M,N be two fixed Natural numbers. The set C, of all cartesian images of size M×N,

together with the usual addition, multiplication and norm

||c|| =√√√m−1,n−1∑

i=0, j=0

c(i, j)2

is a Hilbert Space.

Visual operations should be scale-space invariant. This means, that “objects” mani-fest themselves as meaningful entities only over certain ranges of scales, and the Gaus-sian Kernel is the unique scale-space operator to change scale [5]. Therefore, we modelreceptive fields as Gaussians functions, this means that the ganglion cells sensibility issimulated by a set of bi-variate Gaussians, with a Log-polar distribution, where themean of each Gaussian defines its center, and the variance, the support [9] [2].

gµ,σ (x, y) =1

σ√

2πexp

(−||(x, y) − (µx, µy)||

2σ2

)

Once the foveated image is the response of the receptive fields, it should have asmuch pixels as receptive fields, and each one should be given by:

fi =< φi, c >

where c is the representation of the real world image, i.e., c is one cartesian image.1

1In this case, both receptive Gaussian and image are in the vectorial form.

40

Page 45: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Figure 5.3: Gaussians with various parameters

Definition Gaussian Foveation Operator

Let {φs}s∈S be a set of Gaussians with mean s and variance σ(s), verifying theconditions enumerated on section 5.1.

If C is the Cartesian Images Space, and F the Foveated Images Space, the GaussianFoveation Operator (Fov) is defined as:

Fov : C −→ F

fi = < φi, c >

This means that, for each sampling point is considered a ganglion cell. Each gan-glion cell receptive field is described by a Gaussian, and each foveated pixel corre-sponds to the image response on that receptive field.

This formulation allows to represent the foveation method in a very simple andefficient way. Moreover, it is a bounded linear operator between two finite dimensionalHilbert Spaces, meaning that it can be represented by a matrix. So, if we consider φ asthe matrix where each row i is the receptive field φi, than it is equivalent to say:

f = φc (5.1)

Note that, this approach is clearly global (all cartesian pixels contribute to the valueof the logpolar pixel), meaning that is more heavy but, a priori, it provides more smoothresults, avoiding acute vertices on the image.

Besides, this foveated principle can be done on hardware, meaning that it possibleto create cameras with specific sensors distribution, that generate the image as hererepresented [15]. In this case, the foveated image acquisition is costless.

41

Page 46: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

(a) Cartesian Image (b) Sampling (c) Reconstructed Image

Figure 5.4: Cartesian Image and Gaussian Foveation reconstruction

Reconstruction

Once again, it is presented one method to reconstruct the image. Like the super pixelmethod, it is impossible to remake the cartesian image perfectly, but we should lookfor the best one.

Once it is known the foveated image f, it is desirable to find one operator Fov−1

that is the optimal solution for the equation 5.1.It is known that the least squares solution for this problem, i.e. the solution that

minimizes ||c − c||, it is provide by the Moore Penrose Pseudo-inverse (chapter ??).Based on this results, it is defined the operator of Fov−1, which is the right inverse

of Fov and the best left inverse that it is possible to find (on the least squares meaning).

Definition Gaussian Foveation Pseudo-inverse Operator

If Fov is the Gaussian Foveation operator, its “inverse” is defined as:

Fov−1 : F −→ C

f −→ φ+ f

where φ+ is the Moore-Penrose Pseudo-inverse of φ

Since this is a global approach, the number of computations increases with the sizeof the matrix, i.e, if there are S logpolar receptive fields, the computations price isS × M × N, where M and N are the cartesian image size.

Depending on the shape and distribution of receptive fields, the matrix φ can beill conditioned, meaning that φ∗φ is not well inverted. To avoid this problem, it canbe implemented the Tikhonov Regularization, which is quite simple to implement andproduces good results.

The value of the Tikhonov Regularization factor depends on the characteristicsof Φ, in these case the optimal value is around 10−4, but it can not be generalized.However, on this work it was considered τ = 0, i.e., it was not applied regularizationmethods, since it was out of the aim of the project.

42

Page 47: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

(a) Initial image

1 0.1 0.01 0.001 10e−4 10e−5 10e−6 10e−7 10e−8 10e−9 10e−100.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(b) 324 foveated pixels

1 0.1 0.01 0.001 10e−4 10e−5 10e−6 10e−7 10e−8 10e−9 10e−100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(c) 1284 foveated pixels

1 0.1 0.01 0.001 10e−4 10e−5 10e−6 10e−7 10e−8 10e−9 10e−100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

(d) 2755 foveated pixels

Figure 5.5: Influence of regularization depending on the characteristics of φ

5.3 Image Transforms with Gaussian Foveation

Like on the super-pixel methods, in this section it is proposed a method to perform theimage geometric transforms.

On chapter 3 was defined this transforms on the cartesian domain. It is obvious thatthese kind of transforms are operators over the Hilbert Space of the Cartesian Images.More, it is obvious that they are linear and bounded.

Remember that a foveated pixel i is given by:

fi =< φi, c >

so, if T is one cartesian transform, this transform over that foveated pixel T f ( fi) mustbe equal to:

T f ( fi) =< φi,T (c) >

but, from the theorem 2.4 there is T ∗ such

T f ( fi) =< T ∗(φi), c >

where T ∗ is the adjoint operator of T. This means that, any geometrical transform onthe cartesian domain, can be described by its adjoint on the foveal domain. But, how isdefined T ∗? From the geometry of the image acquisition, it is very easy to deduce thatT ∗ = T−1, where T−1 is the inverse transform of T [figure 5.6].

43

Page 48: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

(a) Inicial image (b) Object trans-lates onto right

(c) Ganglion cellstranslate onto left

Figure 5.6: Inspiration for translation adjoint operator

< φk,T (c) >

=

m−1∑

i=0

n−1∑

j=0

φk(i, j)T (c(i, j))

=

min{m−1,m−1+∆i}∑

i=max{0,∆i}

min{n−1,n−1+∆ j}∑

j=max{0,∆ j}φk(i, j)c(i − ∆i, j − ∆ j) ,take i′ = i − ∆i, j′ = j − ∆ j

=

min{m−1,m−1−∆i}∑

i′=max{0,−∆i}

min{n−1,n−1−∆ j}∑

j′=max{0,−∆ j}φk(i′ + ∆i, j′ + ∆ j)c(i′, j′)

=< T−1(φk), c >

This lays on the following definition:

Definition Geometric transforms on foveated images

Let T be one usual geometric image space transform. The best approximation of thattransform on the foveal domain is given by:

T f : F −→ F

T f ( fi) = < T−1(φi),φ+ f >=< T−1(φi)φ+, f >

Meaning that this transform can be implemented, once again, using simple matrixoperations:

T f ( f ) = T−1(φ)φ+ f

where T−1(φ) is a matrix where each row i is given by T−1(φi).This transform is very heavy: the inverse transform is applied on every receptive

field. However, on some systems these computations can be implemented offline, so

44

Page 49: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

the computation time is not very relevant.

Most of the usual operations on the cartesian domain, can be seen has opera-tors (convolutions, filters, derivatives,...). Therefore, if these operators are linear andbounded, they will have adjoint, and so, applied easily on the Gaussians, producing therequired transformations on the foveated image.

5.4 Conclusions

On this chapter, we have was proposed the Gaussian Foveation method. This method,based on the receptive fields of ganglion cells, produces foveated images using simplematrix operations.

This formulation provides a simple way to reconstruct the Cartesian image: usingthe pseudo-inverse, we get the least squares approximation of the original image. But,once the approach is global this method is very costly.

The Tikhonov Regularization produces goods results on reconstruction, but thevalue of the optimal factor is not stable.

A simple technique to adapt all linear bounded operators of the cartesian imagespacewas introduced. Using basic Operator theory, it was shown that these operatorscould be implemented on the receptive fields using the adjoint operator. Moreover, theycan be applied offline.

45

Page 50: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

46

Page 51: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Chapter 6

Super Pixel vs GaussianFoveation

The objective of this chapter is to compare the Gaussian Foveation with the Super PixelFoveation approach here presented. It starts on the quality of reconstruction and thenon the tracking problem.

For a fair comparison, the sampling point should be on the same number, and thearea of the logpolar pixels should be related. Note that, on the Super Pixel approachthis parameters are difficult to control, but the flexibility of the Gaussian Foveationallows us to control the number of samples, its position, and area.

Definition Centroid of a Logpolar pixel

If (u, v) is a logpolar pixel, then its centroid is defined as

µ(u, v) =1

a(u, v)

i, j

1Su (i, j)1R

v (i, j)

i.e., it is the cartesian coordinate of the center of the logpolar pixel.1

To obtain the location and size of the receptive fields, first step was calculated thepixel distribution with the Super Pixel method, and for each logpolar pixel was storedits centroid and area. The centroid coordinate generate the center of the receptive field.The area of the receptive field was related with the area of the super pixel, with a factor3 providing enough overlap between the receptive fields for a good reconstruction.

1a(u,v) is the pixel’s (u,v) area

47

Page 52: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Method Aerials Miscellaneous Sequences Textures

Super Pixel 1.93% 3.66% 3.75% 7.03%Gaussian Foveation 1.88% 3.08% 2.86% 6.83%

Table 6.1: Error average of the image reconstruction

Method Aerials Miscellaneous Sequences Textures

Super Pixel 0.03% 0.001% 0.02% 0.41%Gaussian Foveation 0.02% 0.08% 0.02% 0.394%

Table 6.2: Error variance of the image reconstruction

6.1 Reconstruction

On this section is compared the reconstruction methods presented on chapter 4 and5. The image database was taken from “USC-SIPI Image Database”. On this database, there exists 4 types of images: Aerials (38 images), Miscellaneous (44 images),Sequences (70 images in 4 sequences) and Textures(154 images).

For each cartesian image c, was applied L−1(L(c)) and Fov−1(Fov(c)), then wascalculated MSE (Mean Square Error), using the norm defined on the Cartesian ImagesSpace. Over this vector of values was calculated the average and variance by type ofimage.

The Mean Square Error has a range between 0 and 1, where, off course the 0 meansthat the reconstruction is perfect, and 1 is the worst case.

In all classes of the images, the average and variance of the errors of both ap-proaches was very close, but in all cases, the Gaussian Foveation has a better perfor-mance.

Note that the aerials and sequences images are quite similar, which provides lessvariance on the reconstruction. On the other hand, textures images have a repeatingpattern along the image, so, the basic principle that inspires foveal approaches (theimportant information is on the center) is broken, inducing higher errors. On this class,the variance is higher because the number of images is much bigger than other classes.

The method Gaussian Foveation improves the performance of the reconstructionmethod. However, the differences are not very relevant.

6.2 Tracking

The tracking problem consists in discovering the motion of one object during time.There are numerous methods to solve this problem and template matching was the oneapplied.

48

Page 53: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Cartesian Super Pixel Foveation Gaussian Foveation

Aerials

Miscellaneous

Sequences

Textures

Table 6.3: Reconstruction examples with Super Pixel and Gaussian Foveation

49

Page 54: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

The template matching method is very simple, the main idea is to search among“all possible” motions the one that produces less error.

Figure 6.1: Translations hypothesis grid, with resolution ∆h = 1 and ∆v = 1 pixels

Ideally, from the first frame, called reference frame, we simulate all possible mo-tions, however the memory of the robot is finite, so it is impossible to simulate allmotions. Consequently only a finite grid with some resolution should be generated.

On these tests, the grid resolution was ∆h = 2 and ∆v = 2 pixels (horizontal andvertical, respectively) and the size of the grid was 7 × 7.

There are two approaches possible for this problem: one passive (considering thatthe camera is fixed) the other is active (considering that, the camera dynamicallychanges its parameters to track the object).This type of search makes more sense ina biological paradigm because the eyes of humans track the objects when they move inthe environment, trying to center their projections on the fovea. Over both approaches,tests were made, using both films.

The tracking algorithms were applied several times over the two foveation meth-ods, always with the same conditions, meaning, the same logpolar pixels number, res-olution, grid size, and videos. For simplicity reasons, the motion allowed was onlycomposed translations (vertical and horizontal). Over each film, the estimated positionthe focus point is marked with a red dot.

Passive Tracking

The main idea of passive tracking is that the camera is fixed (it is passive), so whenwe want to compare the frame i with the predict images, we must induce on the im-

(a) Frame 15 (b) Frame 150 (c) Frame 350 (d) Frame 450

Figure 6.2: Passive tracking using Gaussian Foveation on Film A.

50

Page 55: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Frames Pixels Object/Point Tracked Observation

Film A 470 480 × 640 stapler The scene is fixed; the camera is movingFilm B 336 230 × 360 face The camera is fixed; the scene is moving

Table 6.4: Basics properties of the variables to track

age the motion estimated for the frame i−1. Thus, the tracking algorithm has two steps.

For each foveated frame i

1. simulate the motion i-1

2. compare the resulted image with the images predict

The tracking was applied with the two methods over the film A, producing clearresulsts. On both approaches, the results were quite similar, the tracking was correct,and the maximum distance to the ground truth was 4.6 and 4.1 pixels on the Super Pixeland Gaussian Foveation respectively.

This tracking algorithm has one problem, it is very heavy, since online it is neces-sary to calculate the inverse transform of the frame before, frame by frame.

To avoid this problem,an another method based on active tracking was considered.

Active Tracing

This approach simulates that the camera is moving and following a specific point on theimage. So, the coordinates of focus point are changing with the motion, and it shouldbe actualized frame by frame.

Once again, both approaches estimated the motion with high quality, in fact, thedifference between the errors is almost not perceptible. Due this fact, another perfor-mance test was made.

This new test, verifies the method’s robustness to longer motions, simulated byincreasing the film velocity. Once the velocity of the film is increased, the motionbetween two frames will be larger and perhaps out of the grid.

So, the only difference between this test and the previous is that instead of predictthe motion frame-by-frame, if the velocity is k, then is analyzed only “k−by−k” frame.

The test was again over the film A, simulating several velocities.On this tests, both approaches have the same result, when the velocity is increased

to 3, both can not estimate correctly the motion (figure 6.2).

Since both approaches were quite good on tracking the film A. The tests wererepeated over an other film B. On this film, the tests were repeated over two focuspoints here denoted X and Y .

51

Page 56: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

−15 −10 −5 0 5 10 15 20 25−20

−15

−10

−5

0

5

10

15

20

ground truthestimate motion

(a) Estimate motion with Super Pixel Vs Ground truth

−15 −10 −5 0 5 10 15 20 25−20

−15

−10

−5

0

5

10

15

20

ground truthestimate motion

(b) Estimate motion with Gaussian Foveation Vsground truth

0 100 200 300 400 5000

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

(c) Distance between the estimate motion with SuperPixel and ground truth

0 100 200 300 400 5000

0.5

1

1.5

2

2.5

3

3.5

4

4.5

(d) Distance between the estimate motion with Gaus-sian Foveation and ground truth

Figure 6.3: Estimate motions and distances to ground truth in passive tracking.

(a) Super Pixel (b) Gaussian Foveation

Figure 6.4: Distances to ground truth on film A with velocity one, two, and three

52

Page 57: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

(a) Frame 15 (b) Frame 150

(c) Frame 250 (d) Frame 260

Figure 6.5: Active tracking using Super Pixel on Film B

−140 −120 −100 −80 −60 −40 −20 0 20−60

−40

−20

0

20

40

60

ground truthestimate motion

(a) Super Pixel

−140 −120 −100 −80 −60 −40 −20 0 20−50

−40

−30

−20

−10

0

10

20

30

ground truthestimate motion

(b) Gaussian Foveation

Figure 6.6: Estimate motion with film B

Considering the point X, the results were the same as in film A: both algorithmgave the correct result, and when the velocity was increased, they failed on the samecase.

How ever, on point Y , the tracking algorithms did not have the same performance.The Super Pixel approach failed when the velocity was 1 and the Gaussian Foveationgave the correct result.

This situation shows that Gaussian Foveation is more robust to image characteris-tics than the Super Pixel approach.

53

Page 58: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

6.3 Conclusions

On this chapter, we have verified that Super Pixel approaches and Gaussian Frame havesimilar behaviors in reconstruction and tracking: the reconstructed image is very closeto the reference one (particularly on the center); the tracking is very robust using thetemplate match method .

However, the Gaussian Foveation produces a small improvement on the results, notonly on the reconstruction performance as also on the tracking robustness, but a deeperstatistical evaluation of its performance should be made to further support this claim

54

Page 59: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Chapter 7

Conclusions

Foveal vision should be implemented on humanoid systems, since it provides a hugedata compression and it is still robust to tracking on different environments. This hap-pens because, like on human vision, the central information is kept.

Despite all scientific work about the human retina, foveation methods were notenough explored, because this new paradigm requires hard work on the reformulationof the usual cartesian methods. Most of the usual approaches do not simulate the corti-cal image in a realist way, since they are based only on the photoreceptors distribution,and not on receptive fields.

The usual Super Pixel approaches are very simple to describe, but confuse to imple-ment, resulting from their hard work on the pixel’s indices. The reconstruction methodis very elementary, but it can be improved with higher degree interpolation. The neigh-borhood relationship is very complex; we have to recur to the connectivity graph torepresent it. Consequently, the usual pixel operations (from simple geometrical trans-form to filters) should be carefully redefined and implemented.

The approach here presented is inspired on complex biological structures: the re-ceptive fields of ganglion cells. The foveation method was implemented using simplematrix operations, which can be very costly, but provides good results. The reconstruc-tion method provides the least squares solution, but it can be ill conditioned due to thereceptive fields’ distribution. However, this problem can be solved with some regu-larization methods. Employing adjoint operators, it is possible to implement the usualcartesian transforms on the foveal domain, which means that the problems related withthe neighborhood relationship here do not exist.

Despite the improvement on the biological model, the Gaussian Foveation resultswere not very different from the Super Pixel approach. However, due to its flexibility,simple improvements can be made, that may provide better results.

55

Page 60: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Future Work

The approach here presented can be refined in different contexts.

It was not on the scope of this work to exploit the sampling function, but it canbe improved on two meanings: on one hand, there are some biological constrains thatare not verified on the sampling generated by the Super Pixel Foveation method. Onthe other hand, it can be optimized keeping the reconstruction performance, but redis-tributing the pixels on a logpolar way.

There are biological works, which model receptive fields as Difference of Gaus-sians, instead of Gaussians. Of course, this approach looses the scale-space invariance,but wins on the biological formulation.

It is obvious that, the ill condition of φ∗φ is related with the receptive fields shape,distribution and density. From the Frame Theory, it is possible to relate the conditionof φ directly with the functions φi [12], and measure the “quality” of the functions intherms of their approximation to an orthonormal basis (note that as close {φi} is to aorthonormal basis, as better condition of φ). Therefore, the Frame Theory could bevery helpful to find the optimal solution for the receptive field model.

The Gaussian Foveation can be improved also on other way: it should found amodel, that provides the maximum field of view (with high quality) for a restrictnumber of receptive fields. For that, several tools from Mathematics can be useful:Functional Analysis, Operator Theory, Numerical Analysis... Moreover, regularizationmethods (as the one here presented) can be helpful on image reconstruction. Ideallyshould be find a factor τ that improves the results on most of images.

The usual image processing methods (like filters) should be adapted to the fovealdomain using the operator theory. Ideally, a library with these news methods can beconstructed.

The tracking tests should be enlarged to other geometrical transforms, as scaling,rotations, affine, projective...

56

Page 61: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

List of Figures

1.1 Brodmann areas on human brain . . . . . . . . . . . . . . . . . . . . 31.2 World image, retinal image and cortical image . . . . . . . . . . . . . 4

2.1 Projection on the column space of a 3 × 2 matrix . . . . . . . . . . . 14

3.1 Cartesian neighborhood . . . . . . . . . . . . . . . . . . . . . . . . . 203.2 Stereotyped human . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.3 Rotation of one image . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.1 Coordinates Distribution . . . . . . . . . . . . . . . . . . . . . . . . 274.2 Log-polar pixels distribution over one image . . . . . . . . . . . . . . 304.3 Cartesian and reconstructed Log-polar image . . . . . . . . . . . . . 324.4 Resulting reconstructed images with different parameters . . . . . . . 324.5 Connectivity graph of a logpolar image . . . . . . . . . . . . . . . . 344.6 Cartesian Translation vs. Super Pixel translation . . . . . . . . . . . . 35

5.1 Ganglion Cells anatomy . . . . . . . . . . . . . . . . . . . . . . . . 385.2 Representation of receptive fields . . . . . . . . . . . . . . . . . . . . 395.3 Gaussians with various parameters . . . . . . . . . . . . . . . . . . . 415.4 Cartesian Image and Gaussian Foveation reconstruction . . . . . . . . 425.5 Influence of regularization depending on the characteristics of φ . . . 435.6 Inspiration for translation adjoint operator . . . . . . . . . . . . . . . 44

6.1 Translations hypothesis grid, with resolution ∆h = 1 and ∆v = 1 pixels 506.2 Passive tracking using Gaussian Foveation on Film A. . . . . . . . . . 506.3 Estimate motions and distances to ground truth in passive tracking. . . 526.4 Distances to ground truth on film A with velocity one, two, and three . 526.5 Active tracking using Super Pixel on Film B . . . . . . . . . . . . . . 536.6 Estimate motion with film B . . . . . . . . . . . . . . . . . . . . . . 53

57

Page 62: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

58

Page 63: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

List of Tables

6.1 Error average of the image reconstruction . . . . . . . . . . . . . . . 486.2 Error variance of the image reconstruction . . . . . . . . . . . . . . . 486.3 Reconstruction examples with Super Pixel and Gaussian Foveation . . 496.4 Basics properties of the variables to track . . . . . . . . . . . . . . . 51

59

Page 64: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

60

Page 65: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

Bibliography

[1] C.J.S. Alves, Fundamentos de analise numerica I, AEIST, Instituto SuperiorTecnico, 2001

[2] A. Bernardino, Binocular Head Control with Foveal vision: Methods and Ap-

plications, PHD Thesis, Instituto Superior Tecnico, Universidade Tecnica deLisboa, Portugal, 2004

[3] S. C. Brenner, L. R. Scott The Mathematical Theory of Finite Element Methods,

Springer, 2002, ISBN 0387954511,

[4] I. Gohberg and S. Goldberg, Basic Operator Theory, Birkhauser, 1981

[5] B. M. Haar Romeny Front-End Vision and Multi-Scale Image Analysis, Springer,2003, ISBN:1-4020-1503-8

[6] E. R. Kandel, J. H. Schwartz, T. M. Jessell Essentials of Neural Science and

Behavior, Appleton and Lange,1995, ISBN: 978-0-8385-2245-5

[7] H. Kolb, How retina works, American Scientist, January-February 2003 Volume91, Number 1

[8] M.W. Levine and J.M. Shefner, Fundamentals of sensation and perception, 2nd

ed, Pacific Grove, CA: Brooks/Cole, 1991

[9] T. Lindeberg and L. Florack Foveal scale-space and linear increase of receptive

field size as a function of eccentricity, Technical report ISRN KTH NA/P-94/24-SE

[10] M. Sonka, V. Hlavac, R. Boyle Image Processing, Analysis and Machine Vision,

PWS Pub., 1999, ISBN 053495393X,

[11] G. Strang Linear Algebra and Its Applications, Sec. Ed., Academic Press, NewYork, 1980

[12] T.Strohmer, Irregular Sampling, Frames and Pseudoinverse, Master Thesis,University of Vienna, Austria 1991

61

Page 66: Gaussian Foveation - Autenticação · Os meus artistas preferidos, que s˜ao os meus mais curiosos f ˜as: Xeque e Marta, obrigada por todas as vezes que me questionaram sobre a

[13] F. S. Teixeira and A.B. Lebre Apontamentos de Analise Funcional InstitutoSuperior Tecnico, 1995

[14] R. Wallace, Pin-wen Ong, B. Berderson, E.Schwartz Space Variant Image Pro-

cessing Technical Report, 1993

[15] R.Wodnicki, G.W. Roberts and M.D. Levine A foveated image sensor in standard

CMIS technology, Custom Integrated Circuits Conf., Santa Clara, May 1995

62