bundling features for large scale partial-duplicate web image search

Bundling Features for Large Scale

Partial-Duplicate Web Image Search

Zhong Wu∗, Qifa Ke, Michael Isard,

and Jian Sun

CVPR 2009 Citations: 163

Outline

Introduction

Bundled features

Image Retrieval using bundled feature

Experiments and results

Conclusion

INTRODUCTION

Target

ه Given a query image, is to locate its near- and

partial-duplicate images in a large corpus of web

images.

Novel Scheme

ه Each group of bundled features becomes

much more discriminative than a single

feature

ه within each group simple and robust

geometric constraints can be efficiently

enforced.

BUNDLED FEATURES

Related Work

ه SIFT(Scale Invariant Feature Transform)ه keypoint & descriptor from the region centered at the

keypoint

ه MSER(Maximally Stable Extremal Region)ه Affine-covariant stable region + SIFT from the region

Bundle Features

ه SIFT features: S = {sj}

ه MSER detections: R = {ri}

ه Define bundled feature B = {bi} :

bi = {sj|sj ∝ ri, sj ∈ S}

ه We discard any MSER detection whose ellipse spans more

than half the width or height of the image

IMAGE RETRIEVAL USING

BUNDLED FEATURE

Feature quantization

ه Hierarchical k-means

ه One million visual words from 50K training

images

ه K-D tree

ه pointList = [(2,3), (5,4), (9,6), (4,7), (8,1),

(7,2)]

Matching bundled features

Let p = {pi} and q = {qj} be two bundled features with

quantized visual words pi, qj ∈ W

ه Define a matching score :

ه M(q; p) = Mm(q; p) + λMg(q; p)

ه where λ is a weighting parameter

ه Membership term:

ه We simply use the number of common visual

words between two bundled features to define

the membership term Mm(q; p)

ه Mm(q; p) = |{pi}|

ه Geometric term:

ه Our geometric term performs a weak geometric

verification between two bundled features p and

q using relative ordering:

Indicator Function

Indexing and retrieval

ه avoids storing and comparing high dimensional

local descriptors

ه reduces the number of candidate images

ه Voting

ه 100 vocabularies in a document, ‘a’ 3 times

ه 0.03 (3/100)

ه idf

ه 1,000 documents have ‘a’, total number of

documents 10,000,000

ه 9.21 ( ln(10,000,000 / 1,000) )

ه if-idf = 0.28( 0.03 * 9.21)20

EXPERIMENTS AND RESULTS

Dataset

ه Basic datasetه One million images most frequently clicked in a

popular commercial image-search engine

ه (50K, 200K, 500K)

ه Ground truthه Manually labeled 780 partial-duplicate web image

form 19 groups.

ه Evaluation dataset = basic dataset + ground truth

ه Queryه 150 images from ground truth

Evaluation

ه Baseline

ه Bag-of-features approach with soft

assignment[13]

[13] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman.

Lost in quantization: Improving particular object retrieval in large scale image databases.

In CVPR, 2008.23

ه Compare(HE)

ه enhance the with hamming embedding [3] by

adding a 24-bit hamming code to filter out

target features.

[3] H. Jegou, M. Douze, and C. Schmid.

Hamming embedding and weak geometric consistency for large scale image search.

In ECCV, 2008. 24

baseline0.35 to

Bundled(mem)0.40

a 14% improvement

baseline0.35 to

Bundled 0.49

a 40% improvement

baseline0.35 to

Bundled+HE0.52

a 49% improvement

ه Compare(Re-ranking)

ه Full geometric verification, RANSAC for top

300 candidate images

Baseline+re-rank 0.50 to

Bundled+re-rank 0.62

a 24% improvement

Baseline 0.35 to

Bundled+re-rank 0.62

a 77% improvement

ه Trade-off

ه Run time

ه a single CPU on a 3.0GHz Core Duo desktop

with 16G memory

Sample results

Query Image

Baseline approach

Our approach

CONCLUSION

Conclusion

ه Bundled features property

ه More discriminative than individual SIFT

features.

ه Simple and robust geometric constraints

ه Partially match two groups of SIFT features

ه Advantage

ه Robustness to occlusion, photometric and

geometric changes

Thanks for your Listening

bundling features for large scale partial-duplicate web image search

Technology

tying und bundling in digitalen märkten eine

business model canvas un-bundling

duplicate content & ecom

efficient duplicate detection over massive data sets

duplicate content seo campus 09-03-2012

bab i pendahuluan 1.1 latar belakang...

duplicate content - seo-campixx vortrag 2011

partial utilaj

partial histologie

cara unlock modem huawei e173 bundling xl

atest partial

tatacara duplicate site frog vle

le mixed price bundling, une stratégie marketing ... · 1...

penerbit salemba harga penerbit salemba.pdf · 1 penerbit...

analisis preferensi konsumen terhadap bundling kartu perdana...

tutorial duplicate cleaner

mediu - partial

多重出版 (duplicate publication) について -...

5 - conférence duplicate content par

duplicate cleaner