資訊所洪詡淮 p76994610

1

P76994610Image and Vision Computing 25 (2007) 18021813

Deformation tolerant generalized Hough transformfor sketch-based image retrieval in complex scenes

M. Anelli, L. Cinque, Enver Sangineto

1Outline1. Introduction

2. Methods

3. Results

4. Conclusion 2Introduction3Introduction(1/4)In the last 1215 years the availability of digital visual information has grown very quickly.

Content Based Image Retrieval (CBIR) is a research area whose aim is the development of tools for retrieval of visual information using its perceptual content.4 feature()content based4Introduction(2/4)In Image Retrieval by Sketch the query is a stylized sketch drawn by the user in order to specify the shape features she is interested to find in the images within the systems database.

The issue of inexact matching between the sketch and the images and the issue of segmentation are the two main problems which a sketch-based image retrieval system has to deal with. 5papersketch-based,sketch-based Stylized-> inexact matching Not isolated or not uniform background->segmentation5Introduction(3/4)Most of the methods and techniques for shape-based image retrieval can be classified in three main categories:

statistical techniques

deformable template matching

multiscale representations

6Paper,Segmentation,object,,6Introduction(4/4)modified the GHTFirst of all, we spread the voting result in order to deal with small local deformations without increasing the whole asymptotic computational space and time complexity. Moreover, once the most likely position of the sketch in the image has been localized using the votes in the accumulator, shape segmentation is further verified.7GHT->segmentation ->inexact matching VS7Methods88Canny edge detection

The first filter aims at deleting edge pixels surrounded by a disordered and thick texture.

The second filter deals with ordered textures (e.g., a sheaf of parallel lines).9The first filterC(p) : a square mask of n 1 x n 1 pixels centered at pixel p. N : the number of edge pixels in C(p). (p) : the gradient direction of a generic edge pixel p

we cancel the edge pixel p from the edge map if: N > 1 2 > 2 , where 1 and 2 are two pre-fixed thresholds.(n 1 = 40, 1 = 260, 2 = 0.165)

10filter: gradient . () filterMASK P10The second filterLet N be the number of edge pixels p belonging to the mask D(p)n2 x n2 and such that (p) = (p).

We cancel p if N > 3 (n2 = 20, 3 = 120).11maskp3P11From now on we will denote with I the edge map of the currently analyzed image of the systems database after the salience filter application.

12I12Generalized Hough Transform(GHT)GHT(template)

I. R-Table Step 1(Xc, Yc)Step 2 R-Tablei , i=1,2,K, /K 0180Step 3 (X,Y) (r,) 13Generalized Hough Transform(GHT)Step 4 () (r,) i Step 5Step 45R-table

14 (r,) xy14Generalized Hough Transform(GHT)II. Step 12D Hough table H(xc, yc)0Step2 (x,y) Step 3R-Table i i (r,)

Step 4 H(xc, yc) 1 2 3 Step 5 H(xc, yc)(xc, yc)

15Matching with target image => voting15Deformation tolerant GHT (DTGHT)ISuser-drawn sketchSegITR-Tablemcardinality of T(m = #T = #S)R-Tableif pk is a point of S, then: T[k] = pr - pk , pr being the centroid of S

16MS R-table16Deformation tolerant GHT (DTGHT) I(p) and S[k] denote, respectively, the direction of the point p in I and pk in S.In order to improve the accuracy, I(p) and S[k] are computed using adjacent points in the same segment using the following formula:

where ( = 10) is a constant and pj is the jth point in a given segment s (and analogously for S ).

1717Deformation tolerant GHT (DTGHT)Nevertheless, we do not use S[k] to index T as in the original GHT.In fact we aim at looking for a shape S contained in I which is similar but not necessarily identical to S. Hence, we usually expect that a point p in S and a corresponding point p in S are quite differently oriented.18GHT,S[k]R-tableindex18Voting ProcedureWe perform a vote operation analogous to the original GHT voting phase.

= /8Now we have a voting result in space A.

19Is,pvoting,Let(xi, yi)=T[i] 19Cluster the Votes in Afixed vote dispersion window WLet W2l+1x2l+1 be a square mask (l is defined below).W(p) is the set of all the nonzero cells of A contained in the mask W when its center is positioned at p.The mass M(p) of W(p), as the sum of the values of the elements of W(p).The maximum of M(p) corresponds to the mass of the region with the highest concentration of votes.20GHTA maskW2l+1*2l+1 W(p)pWW(p)M(p) M(p)P20Compute M(p)M(p) is incrementally built using a technique similar to the integral image.

Wi(p) represents the nonzero elements of the ith column of the mask W(p).

21M(p) ,SPEED UP ,W2l+1,l+121Compute M(p)Let now C(x, y) be the cumulative row sum computed with respect to the yth column of A

22C22Compute M(p)If P = arg max pIM(p), then P with a high probability is the point in I corresponding to the centroid of the shape most similar to S.Since the deformation tolerance area delimits the region of the points vary with SP , from the parameter l it decides the size of the shape details which will be ignored by the system in the matching process.We set l = d, where d is the diagonal of I and =0.03( in our trials l = 12, which leads to a window side of 25 pixels.).23PM(p)p,S L23Example of systems output

24Line segment matchingSP is the projection of S on I with P its center of massThick textured regions and cluttered backgrounds can randomly concentrate their votes in a unique point not actually corresponding to a shape S similar to S.

25SP SPI random25Line segment matching Extraneous vs. Valid Segments A point p of to I is a valid point if

i is a valid hypothesis for p.

We call a segment si a valid segment if #Vi k1 x #si , where k1 = 0.7 and Vi isthe set of all the valid points of the segment si .

26,pvalid, ipSivalid,valid26Line segment matchingA point p of to I is a nearby point if

We call a segment si a extraneous segment if si is not a valid segment and # Ni k2 x #si , where k2 = 0.2 and Ni is the set of all the nearby points of the segment si .Let V be the subset of Seg composed of all the valid segments.Let E be the subset of Seg composed of all the extraneous segments.

27validnearby extraneous27Matching Test>

validvalid,(mtrue)

28V28Similarity

29Ivalid Smatch Sim0.8929Similarity rankThe DTGHT, like the original GHT, is not rotation nor scale invariant.

In the off-line preprocessing of each database image we produce a pyramidal representation of I composed of 5 different resolution levels .30DTGHTGHT DTGHTscalescale30Similarity rankThe final scale invariant similarity estimation (SISim) between I and S is given by

we can suppose the user usually draws a sketch with its expected orientation (e.g., a horizontal car or horse, a vertical tree) and thus rotation invariance can often be ignored in order to speed up the systems performance.

31SIM SISimranking 31Similarity rank

32Results

33Computational complexityn is the number of edge pixels of I N1= w x h, m = #S, N = #Segk is the number of scale iterations(N n,N1 )R-tablevoting phasefind max Mconstruction of the sets V and EExtraneous vs. Valid Segments and the Matching Test.

3434Computational complexitythe computational worst case cost of the original GHT is O(h(nm + N1 )) with h iterations for different discrete values of scale. From this comparison we can state that the DTGHT and the GHT have the same asymptotic worst case behavior.Moreover the DTGHT needs fewer iterations with respect to the GHT in order to deal with the same range of scale changes (i.e., k < h)

35DTGHTGHTbig O,DTGHT,(scale)35Experimental resultsWe have implemented our method with non-optimized Java code and tested it on a Pentium IV, 1.7 GHz.Less than 2 s, one second on average.Images from 200 x 200 up to 380 x 350 pixels.Include 5 different iterations per image for the 5 corresponding image scale values.Not include the preprocessing.3636Experimental resultsThe systems database is composed of 283 images randomly taken by the Web.

No manual segmentation has been performed on the images in order to separate the interesting objects from their background or from other adjacent or occluding objects. Also lighting conditions and noise degree are not fixed.37Experimental results

38User4 user5,: ,uniformfalse positive,valid38Experimental results Comparison to other approaches24DTGHT,15do not apply scale iterations ,using the objects minimum enclosing rectangle to set the scale parameters.

Kimia dataset.

39,isolated39Experimental resultsComparison to other approaches

we have obtained the second best result.our system is the only one among those mentioned in Table 2 which can be reliably applied to images containing occlusions and non-uniform backgrounds.

40DTGHT,objectisolated,DTGHT40Experimental resultsComparison to other approachesCaltech 101 dataset, composed of real images with significant texture and clutter.

160 images for a given query was about 140 seconds, including 5 different scale iterations per image.

41,,Caltech41Conclusion

42ConclusionDTGHT is an effective technique to deal with the two main problems in sketch-based image retrieval: image segmentation and inexact matching.

inexact matching can be realized using a large dispersion vote window and that a dynamic programming approach makes this process efficient.43DTGHTsketch-based : image segmentation and inexact matching. image segmentation mask speed up43ConclusionSegmentation is further obtained comparing the sketch with the candidate image lines.

We have also shown how, differently from most of the existing sketch-based image retrieval approaches, the DTGHT is able to efficiently deal with images with cluttered backgrounds.44candidate segmentation sketch-based DTGHTnon-uniform44Thank You!

45

資訊所 洪詡淮 p76994610

Documents

資訊所洪詡淮 p76994610