Download - Pc Seminar Jordi
![Page 1: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/1.jpg)
Vi l Obj R i iVisual Object RecognitionPerceptual Computing SeminarPerceptual Computing Seminar
Sergio Escalera, Xavier Baró, Jordi Vitrià, Petia Radeva, Oriol PujolBCN Perceptual Computing Lab
![Page 2: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/2.jpg)
Index
1. Introduction
2. Recognition with Local Features: Basics.
3 I i i SIFT3. Invariant representations: SIFT
4. Recognition as a Classification Problem: gFERNS
5 Very large databases Hashing5. Very large databases: Hashing
Visual Object Recognition Perceptual Computing Seminar Page 2
![Page 3: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/3.jpg)
Introduction
The recognition of object categories in imagesThe recognition of object categories in imagesis one of the most challenging problems incomputer vision especially when the numbercomputer vision, especially when the numberof categories is large.
Humans are able to recognize thousands ofobject types, whereas most of the existingobject recognition systems are trained toj g yrecognize only a few.
Visual Object Recognition Perceptual Computing Seminar Page 3
![Page 4: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/4.jpg)
Introduction
I i t i i t ill i ti “ h ” l l t t t
Visual Object Recognition Perceptual Computing Seminar Page 4
Invariance to viewpoint, illumination, “shape”, color, scale, texture, etc.
![Page 5: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/5.jpg)
Introduction
Why do we care about recognition? (theoretical question)y g ( q )
Perception of function: We can perceive thep p3D shape, texture, material properties,without knowing about objects But thewithout knowing about objects. But, theconcept of category encapsulates alsoi f ti b t h t d ithinformation about what can we do withthose objects.
Visual Object Recognition Perceptual Computing Seminar Page 5
Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:Year 2009, ICCV 2009 Kyoto, Short Course, S eptember 24.
![Page 6: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/6.jpg)
Introduction
Why it is hard?yFind the chair in this image Output of correlation
This is a chair
Visual Object Recognition Perceptual Computing Seminar Page 6
Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:Year 2009, ICCV 2009 Kyoto, Short Course, S eptember 24.
![Page 7: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/7.jpg)
Introduction
Why it is hard?y
P tt h b Si l t l tFind the chair in this image Pretty much garbage; Simple template matching is not going to make it
Visual Object Recognition Perceptual Computing Seminar Page 7
Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:Year 2009, ICCV 2009 Kyoto, Short Course, September 24.
![Page 8: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/8.jpg)
IntroductionWhy do we care about recognition? (practical question)
Visual Object Recognition Perceptual Computing Seminar Page 8
![Page 9: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/9.jpg)
IntroductionWhy do we care about recognition? (practical question)
Visual Object Recognition Perceptual Computing Seminar Page 9
![Page 10: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/10.jpg)
IntroductionWhy do we care about recognition (practical question)?
Query Results from 5k Flickr images (demo available for 100k set)
Visual Object Recognition Perceptual Computing Seminar Page 10
James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, Andrew Zisserman: Object retrieval with large vocabularies and fast spatial matching. CVPR 2007
![Page 11: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/11.jpg)
Recognition with Local Featuresg
It is known that the visual system can use local,informative image «fragments» of a givenobject, rather than the whole object, toj , j ,classify it into a familiar category.
This approach has some advantages over holisticmethodsmethods...
Visual Object Recognition Perceptual Computing Seminar Page 11
![Page 12: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/12.jpg)
Recognition with Local Featuresg
Holistic Fragment‐based
Visual Object Recognition Perceptual Computing Seminar Page 12
g
![Page 13: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/13.jpg)
Recognition with Local Featuresg
Visual Object Recognition Perceptual Computing Seminar Page 13
Jay Hegde, Evgeniy Bart, and Daniel Kersten, "Fragment‐based learning of visual object categories", CurrentBiology, 2008.
![Page 14: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/14.jpg)
Recognition with Local FeaturesgThe most basic approach is called the “bag ofwords” approach (it as inspired inwords” approach (it was inspired intechniques used by the natural languageprocessing community).
Visual Object Recognition Perceptual Computing Seminar Page 14
![Page 15: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/15.jpg)
Recognition with Local FeaturesgAssumptions:
d d f Fragments• Independent features.
• Histogram representation.
Fragments vocabulary
(generic/class‐based etc )based, etc.)
ImageImage =
Fragments histogramhistogram
Visual Object Recognition Perceptual Computing Seminar Page 15
Li Fei‐Fei, Stanford; Rob Fergus, NYU; Antonio Torralba, MIT. Recognizing and Learning Object Categories:Year 2009, ICCV 2009 Kyoto, Short Course, S eptember 24.
![Page 16: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/16.jpg)
Recognition with Local FeaturesgA more advanced approach involves several stepssteps:
• Stage 0: Find image locations where we canreliably find correspondences with other images.
• Stage 1: Image content is transformed into localg gfeatures (that are invariant to translation,rotation, and scale).
• Stage 2: Verify if they belong to a consistentconfigurationconfiguration
Visual Object Recognition Perceptual Computing Seminar Page 16Slide credit: David Lowe
![Page 17: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/17.jpg)
SIFTA wonderful example of these stages can be found inDavid Lowe’s (2004) “Distinctive image features fromDavid Lowe s (2004) Distinctive image features fromscale‐invariant keypoints” paper, which describes thedevelopment and refinement of his Scale Invariantdevelopment and refinement of his Scale InvariantFeature Transform (SIFT).
L l F t SIFT
Visual Object Recognition Perceptual Computing Seminar Page 17
Local Features, e.g. SIFT
![Page 18: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/18.jpg)
Recognition with Local FeaturesgWhich local features?
?
Visual Object Recognition Perceptual Computing Seminar Page 18Slide credit: A. Efros
![Page 19: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/19.jpg)
SIFTStage 0: How can we find image locations where we can reliably findcorrespondences with other images?
A “good” location has one stable sharp extremum.
f ff Good !
f
x
bad
x
bad
xx x x
Visual Object Recognition Perceptual Computing Seminar Page 19
![Page 20: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/20.jpg)
SIFT
Visual Object Recognition Perceptual Computing Seminar Page 20
![Page 21: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/21.jpg)
SIFTStage 0: How can we find image locations where we can reliably findcorrespondences with other images?
How to compute extrema at a given scale:
1) We apply a Gaussian filter:
2) We compute a difference‐of‐Gaussians
3) We look for 3D extrema in the resulting structure.
Visual Object Recognition Perceptual Computing Seminar Page 21
![Page 22: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/22.jpg)
SIFT
Visual Object Recognition Perceptual Computing Seminar Page 22
![Page 23: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/23.jpg)
SIFTThese features are invariant to location and scale
Visual Object Recognition Perceptual Computing Seminar Page 23
![Page 24: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/24.jpg)
SIFTStage 1: Image content is transformed into local features (that are invariantto translation, rotation, and scale).
In addition to dealing with scale changes, we need todeal with (at least) in‐plane image rotation.
One way to deal with this problem is to designdescriptors that are rotationally invariant, but suchdescriptors have poor discriminability, i.e. they mapdifferent looking patches to the same descriptor.
Visual Object Recognition Perceptual Computing Seminar Page 24
![Page 25: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/25.jpg)
SIFT
A better method is to estimate a dominantA better method is to estimate a dominantorientation at each detected keypoint.
1.Calculate histogram of local gradients in the window
2.Take the dominant orientation gradient as “up”
3.Rotate local area for computing descriptor
Visual Object Recognition Perceptual Computing Seminar Page 25
![Page 26: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/26.jpg)
SIFTLowe:
• computes a 36‐bin histogram of edge orientationsweighted by both gradient magnitude and Gaussiandistance to the center,
• finds all peaks within 80% of the global maximum,and then
• computes a more accurate orientation estimateusing a 3‐bin parabolic fit.
Visual Object Recognition Perceptual Computing Seminar Page 26
![Page 27: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/27.jpg)
SIFT
Visual Object Recognition Perceptual Computing Seminar Page 27
![Page 28: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/28.jpg)
SIFT
Local patch around descriptor from Gaussian pyramid
Gradient magnitude Gradient orientationfrom Gaussian pyramid
Visual Object Recognition Perceptual Computing Seminar Page 28
![Page 29: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/29.jpg)
SIFT
Visual Object Recognition Perceptual Computing Seminar Page 29
![Page 30: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/30.jpg)
SIFT
Visual Object Recognition Perceptual Computing Seminar Page 30
![Page 31: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/31.jpg)
SIFTEven after compensating for translation,rotation and scale changes the localrotation, and scale changes, the localappearance of image patches will usually stillvary from image to image.
How can we make the descriptor that we matchmore invariant to such changes while stillmore invariant to such changes, while stillpreserving discriminability between different(non corresponding) patches?(non‐corresponding) patches?
Visual Object Recognition Perceptual Computing Seminar Page 31
![Page 32: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/32.jpg)
SIFTSIFT features are formed by computing the gradient at
h l d d h d deach pixel in a 16x16 window around the detectedkeypoint, using the appropriate level of the Gaussian
id hi h h k i d dpyramid at which the keypoint was detected.
Th di t it d d i ht d b G i f ll ff f tiThe gradient magnitudes are downweighted by a Gaussian fall‐off functionin order to reduce the influence of gradients far from the center, as theseare more affected by small misregistrations.
Visual Object Recognition Perceptual Computing Seminar Page 32
![Page 33: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/33.jpg)
SIFTIn each 4x4 quadrant, a gradient orientationhistogram is formed b (concept all ) addinghistogram is formed by (conceptually) addingthe weighted gradient value to one of 8orientation histogram bins.
Visual Object Recognition Perceptual Computing Seminar Page 33
![Page 34: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/34.jpg)
SIFT
The resulting 128 non negative values form aThe resulting 128 non‐negative values form araw version of the SIFT descriptor vector.
To reduce the effects of contrast/gain (additivevariations are already removed by thegradient), the 128‐D vector is normalized togradient), the 128 D vector is normalized tounit length.
Visual Object Recognition Perceptual Computing Seminar Page 34
![Page 35: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/35.jpg)
SIFTOnce we have extracted features and their descriptorsfrom two or more images the next step is to establishfrom two or more images, the next step is to establishsome preliminary feature matches between theseimagesimages.
Visual Object Recognition Perceptual Computing Seminar Page 35
![Page 36: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/36.jpg)
SIFTOnce we have extracted features and their descriptorsfrom two or more images the next step is to establishfrom two or more images, the next step is to establishsome preliminary feature matches between theseimagesimages.
SIFT uses a nearest neighbor classifier with a distance ratiomatching criterion We can define this nearest neighbormatching criterion. We can define this nearest neighbordistance ratio as
where d1 and d2 are the nearest and second nearest neighbordistances, and DA…..DC are the target descriptor along with itsclosest two neighbors
Visual Object Recognition Perceptual Computing Seminar Page 36
closest two neighbors.
![Page 37: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/37.jpg)
SIFT
Visual Object Recognition Perceptual Computing Seminar Page 37
![Page 38: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/38.jpg)
SIFT
Linear method:
The simplest way to find all correspondingfeature points is to compare all featuresagainst all other features in each pair ofpotentially matching images.
f l h d hUnfortunately, this is quadratic in thenumber of extracted features, which makes itimpractical for some applications.
Visual Object Recognition Perceptual Computing Seminar Page 38
![Page 39: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/39.jpg)
SIFT
Nearest‐neighbor matching is the majorNearest‐neighbor matching is the majorcomputational bottleneck:
• Linear search performs dn2 operations for nfeature points and d dimensionsfeature points and d dimensions• No exact NN methods are faster than linearsearch for d>10search for d>10• Approximate methods can be much faster, butat the cost of missing some correct matchesat the cost of missing some correct matches.Failure rate gets worse for large datasets.
Visual Object Recognition Perceptual Computing Seminar Page 39
![Page 40: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/40.jpg)
SIFT
A better approach is to devise an indexing structureA better approach is to devise an indexing structuresuch as a multi‐dimensional search tree or a hashtable to rapidly search for features near a giventable to rapidly search for features near a givenfeature.
For extremely large databases (millions of images ormore), even more efficient structures based onmore), even more efficient structures based onideas from document retrieval (e.g., vocabularytrees) can be used.trees) can be used.
Visual Object Recognition Perceptual Computing Seminar Page 40
![Page 41: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/41.jpg)
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
The first step is to establish a set of putativeThe first step is to establish a set of putativecorrespondences.
Visual Object Recognition Perceptual Computing Seminar Page 41
![Page 42: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/42.jpg)
SIFT
How can we discard erroneous correspondences?
Visual Object Recognition Perceptual Computing Seminar Page 42
![Page 43: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/43.jpg)
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
Once we have some hypothetical (putative)Once we have some hypothetical (putative)matches, we can use geometric alignmentt if hi h t h i li dto verify which matches are inliers andwhich ones are outliers.
Visual Object Recognition Perceptual Computing Seminar Page 43
![Page 44: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/44.jpg)
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
• Extract features
• Compute putative matches
Visual Object Recognition Perceptual Computing Seminar Page 44
![Page 45: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/45.jpg)
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
• Loop:– Hypothesize transformation T (using a small group of putative
matches that are related by T)
Visual Object Recognition Perceptual Computing Seminar Page 45
matches that are related by T)
![Page 46: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/46.jpg)
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
• Loop:• Loop:– Hypothesize transformation T (small group of putative matches that
are related by T)
Visual Object Recognition Perceptual Computing Seminar Page 46
– Verify transformation (search for other matches consistent with T)
![Page 47: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/47.jpg)
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
Visual Object Recognition Perceptual Computing Seminar Page 47
![Page 48: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/48.jpg)
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
2D transformation models:2D transformation models:• Similarity
(translation,(translation, scale, rotation)
• Affine
• Projective(homography)
Visual Object Recognition Perceptual Computing Seminar Page 48
![Page 49: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/49.jpg)
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
Fitting an affine transformation (given the pointFitting an affine transformation (given the pointcorrespondences):
),( ii yx ),( ii yx
Visual Object Recognition Perceptual Computing Seminar Page 49Slide credit: S. Lazebnik
![Page 50: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/50.jpg)
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
Fitting an affine transformation (given the pointFitting an affine transformation (given the pointcorrespondences):
t
m
m
2
1
0100
2
1
43
21
tt
yx
mmmm
yx
i
i
i
i
i
i
ii
ii
yx
tmm
yxyx
4
3
10000100
tt
2
1
Visual Object Recognition Perceptual Computing Seminar Page 50Slide credit: S. Lazebnik
![Page 51: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/51.jpg)
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
Fitting an affine transformation (given the pointFitting an affine transformation (given the pointcorrespondences):
• Linear system with six unknowns
• Each match gives us two linearly independent equations: d l h l f h fneed at least three to solve for the transformation
parameters
C l A b i d i• Can solve Ax=b using pseduo‐inverse:
x = (ATA)‐1ATb
Visual Object Recognition Perceptual Computing Seminar Page 51Slide credit: S. Lazebnik
![Page 52: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/52.jpg)
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
Fitting an affine transformation (given the pointFitting an affine transformation (given the pointcorrespondences):
• Linear system with six unknowns
• Each match gives us two linearly independent equations: d l h l f h fneed at least three to solve for the transformation
parameters
C l A b i d i• Can solve Ax=b using pseduo‐inverse:
x = (ATA)‐1ATb
Visual Object Recognition Perceptual Computing Seminar Page 52Slide credit: S. Lazebnik
![Page 53: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/53.jpg)
SIFTStage 2: Verify if they belong to a consistentconfig rationconfiguration.
The process of selecting a small set of seedmatches and then verifying a larger set isy g goften called random sampling or RANSAC.
Visual Object Recognition Perceptual Computing Seminar Page 53
![Page 54: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/54.jpg)
RANSACRANSAC was originally formulated in Martin A. Fischler and Robert C. Bolles (June
1981). "Random Sample Consensus: A Paradigm for Model Fitting withApplications to Image Analysis and Automated Cartography". Comm. of thepp g y g p yACM 24: 381–395.
Visual Object Recognition Perceptual Computing Seminar Page 54
![Page 55: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/55.jpg)
RANSAC“We approached the fitting problem in the opposite way from most previoustechniques. Instead of averaging all the measurements and then trying tothrow out bad ones we used the smallest number of measurements tothrow out bad ones, we used the smallest number of measurements tocompute a model’s unknown parameters and then evaluated theinstantiated model by counting the number of consistent samples”
Visual Object Recognition Perceptual Computing Seminar Page 55From “RANSAC: An Historical Perspective” Bob Bolles & Marty Fischler, 2006.
![Page 56: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/56.jpg)
RANSAC
It’s easy to understand and it’s effective
• It helps solve a common problem (i.e., filter out gross errorsintroduced by automatic techniques)introduced by automatic techniques)
• The number of trials to “guarantee” a high level of success(e g 99 99 probability) is surprisingly small(e.g., 99.99 probability) is surprisingly small
• The dramatic increase in computation speed made it possibleto do a large number of trials (100s or 1000s)
• The algorithm can stop as soon as a good match is computedThe algorithm can stop as soon as a good match is computed(unlike Hough techniques that typically compute a largenumber of examples and then identify matches)
Visual Object Recognition Perceptual Computing Seminar Page 56From “RANSAC: An Historical Perspective” Bob Bolles & Marty Fischler, 2006.
![Page 57: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/57.jpg)
RANSACThe basic idea is to repeat M times the following process:1. A model is fitted to the hypothetical inliers, i.e. all free parameters of theyp , pmodel are reconstructed from the data set.
2. All other data are then tested against the fitted model and, if a point fitswell to the estimated model also considered as a hypothetical inlierwell to the estimated model, also considered as a hypothetical inlier.
3. The estimated model is reasonably good if sufficiently many points havebeen classified as hypothetical inliers.
4. The model is reestimated from all hypothetical inliers, because it has onlybeen estimated from the initial set of hypothetical inliers.
5 Finally the model is evaluated by estimating the error of the inliers relative5. Finally, the model is evaluated by estimating the error of the inliers relativeto the model.
This procedure is repeated a fixed number of times, each time producingeither a model which is rejected because too few points are classified as inliersor a refined model together with a corresponding error measure. In the lattercase, we keep the refined model if its error is lower than the last saved model.
Visual Object Recognition Perceptual Computing Seminar Page 57From “RANSAC: An Historical Perspective” Bob Bolles & Marty Fischler, 2006.
, p
![Page 58: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/58.jpg)
RANSAC
Visual Object Recognition Perceptual Computing Seminar Page 58
![Page 59: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/59.jpg)
RANSAC
Line fitting example:Line fitting example:
Task:Estimate best line
Visual Object Recognition Perceptual Computing Seminar Page 59
st ate best e
![Page 60: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/60.jpg)
RANSAC
Line fitting example:Line fitting example:
Sample two points
Visual Object Recognition Perceptual Computing Seminar Page 60
![Page 61: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/61.jpg)
RANSAC
Line fitting example:Line fitting example:
Fit Line
Visual Object Recognition Perceptual Computing Seminar Page 61
![Page 62: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/62.jpg)
RANSAC
Line fitting example:Line fitting example:
Total number of points within a threshold of line.
Visual Object Recognition Perceptual Computing Seminar Page 62
![Page 63: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/63.jpg)
RANSAC
Line fitting example:Line fitting example:
Repeat, until get a good result
Visual Object Recognition Perceptual Computing Seminar Page 63
good esu t
![Page 64: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/64.jpg)
RANSAC
Line fitting example:Line fitting example:
Repeat, until get a good result
Visual Object Recognition Perceptual Computing Seminar Page 64
good esu t
![Page 65: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/65.jpg)
RANSAC
Visual Object Recognition Perceptual Computing Seminar Page 65
![Page 66: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/66.jpg)
RANSAC example: translationp
Putative matches
Visual Object Recognition Perceptual Computing Seminar Page 66Slide credit: A. Efros
![Page 67: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/67.jpg)
RANSAC example: translationp
Select onematch, count inliers
Visual Object Recognition Perceptual Computing Seminar Page 67Slide credit: A. Efros
![Page 68: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/68.jpg)
RANSAC example: translationp
Find “average” translation vector
Visual Object Recognition Perceptual Computing Seminar Page 68Slide credit: A. Efros
![Page 69: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/69.jpg)
RANSACInterest points( / )(500/image)
Putative correspondences (268)(268)
Outliers (117)
Inliers (151)
Final inliers (262)
Visual Object Recognition Perceptual Computing Seminar Page 69
![Page 70: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/70.jpg)
SIFT Applicationspp
Visual Object Recognition Perceptual Computing Seminar Page 70
![Page 71: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/71.jpg)
SIFT Applicationspp
Visual Object Recognition Perceptual Computing Seminar Page 71
![Page 72: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/72.jpg)
SIFT Applicationspp
Visual Object Recognition Perceptual Computing Seminar Page 72
HDRSoft
![Page 73: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/73.jpg)
SIFT Applicationspp
Visual Object Recognition Perceptual Computing Seminar Page 73
![Page 74: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/74.jpg)
Matching and Classificationg
SIFT allows reliable real‐time recognition butat a computational cost that severely limitsthe number of points that can be handled.
A standard implementation requires 1 ms perfeature point which limits the number offeature point, which limits the number offeature points to 50 per frame if one‐requires frame rate performancerequires frame‐rate performance.
Visual Object Recognition Perceptual Computing Seminar Page 74
![Page 75: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/75.jpg)
Matching and Classificationg
An alternative is to rely on statistical learningtechniques to model the set of possibleappearances of a patch.
The major challenge is to use simple modelsto allow for real time efficient recognitionto allow for real‐time, efficient recognition.
Visual Object Recognition Perceptual Computing Seminar Page 75
![Page 76: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/76.jpg)
Matching and Classificationg
Can we match keypoints using simplerfeatures without intensive preprocessing?
{ }? : { … }We will assume that we have the possibilityp yto train a classifier for each keypoint class.
Visual Object Recognition Perceptual Computing Seminar Page 76
![Page 77: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/77.jpg)
Matching and ClassificationgSimple binary features I(mi,1)
I( )I(mi,2)
The test compares the intensities of twopixels around the keypoint:pixels around the keypoint:
)I(m)if I(m ii1 21
otherwise
)I(m)if I(mf i,i,
i 01 21
Visual Object Recognition Perceptual Computing Seminar Page 77
![Page 78: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/78.jpg)
Matching and ClassificationgWithout intensive preprocessing
We can synthetically generate the set ofkeypoint’s possible appearances undervarious perspective, lighting, noise, etc.
Visual Object Recognition Perceptual Computing Seminar Page 78
![Page 79: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/79.jpg)
Matching and ClassificationgFERN Formulation
We model the class conditional probabilitiesof a large number of binary features whichare estimated by a training phase.y g p
At run time, these probabilities are used toAt run time, these probabilities are used toselect the best match for a given imagepatchpatch.
Visual Object Recognition Perceptual Computing Seminar Page 79
![Page 80: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/80.jpg)
Matching and ClassificationgFERN Formulation
fi : Binary feature.
Nf : Total number of features in the model.
Ck : Class representing all views of an image patcharound a keypoint.
Given f1 ,..., f Nf select the class k such that
)|()|( CfffPfffCPk )|,,,(maxarg),,,|(maxarg 2121 kNk
Nkk
CfffPfffCPkff
Visual Object Recognition Perceptual Computing Seminar Page 80
Mustafa Ozuysal, Michael Calonder, Vincent Lepetit, Pascal Fua, "Fast Keypoint Recognition Using RandomFerns," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 99, , 2009
![Page 81: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/81.jpg)
Matching and ClassificationgFERN Formulation
However, it is not practical to model the jointdistribution of all features. We group featuresinto small sets (fern) and assume independencebetween these sets (Semi‐Naïve BayesianClassifier):
Fj : A fern is defined to be the set of S binaryfeatures {f f +S }.features {fr ,..., fr+S }.
M is the number of ferns, Nf = S X M.
Visual Object Recognition Perceptual Computing Seminar Page 81
![Page 82: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/82.jpg)
Matching and ClassificationgFERN Formulation
NkN CfffP f
f21 !parameters2)|,,,(
fkikN
kN
NCfPCfffP
ffffN
f
21
21
,parameters)|()|,,,(
p)|,,,(
fi
kikN fffff
121
simple. but too
,p)|()|,,,(
M
j
skjkN MCFPCfffP
f1
21 .parameters 2)|()|,,,( j 1
Visual Object Recognition Perceptual Computing Seminar Page 82
![Page 83: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/83.jpg)
Matching and ClassificationgFERN Implementation
We generate a random set of binary features.A binary feature outputs a binary number
2
y p y
possibilities
8ibili ipossibilities
A fern with S nodes outputs a number between o and 2S‐1
Visual Object Recognition Perceptual Computing Seminar Page 83
A fern with S nodes outputs a number between o and 2 ‐1.
![Page 84: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/84.jpg)
Matching and ClassificationgFERN Implementation
When we have multiple patches of the sameclass we can model the output of a fern witha multinomial distribution.
Probability for each possibility.a multinomial distribution. possibility.
Visual Object Recognition Perceptual Computing Seminar Page 84
![Page 85: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/85.jpg)
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 85Slide Credit: V.Lepetit
![Page 86: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/86.jpg)
Matching and Classificationg
0
1
1
6
Visual Object Recognition Perceptual Computing Seminar Page 86Slide Credit: V.Lepetit
![Page 87: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/87.jpg)
Matching and Classificationg
10
01
01
1
6
Visual Object Recognition Perceptual Computing Seminar Page 87Slide Credit: V.Lepetit
![Page 88: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/88.jpg)
Matching and Classificationg
110
001
101
1
65
Visual Object Recognition Perceptual Computing Seminar Page 88Slide Credit: V.Lepetit
![Page 89: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/89.jpg)
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 89Slide Credit: V.Lepetit
![Page 90: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/90.jpg)
Matching and Classificationg
N liNormalize:P ( f1, f 2 , , f n | C c i )
000001
1
001
111
Visual Object Recognition Perceptual Computing Seminar Page 90Slide Credit: V.Lepetit
![Page 91: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/91.jpg)
Matching and ClassificationgFERN Implementation
At the end of the training we haveAt the end of the training we havedistributions over possible fern outputs foreach classeach class.
Visual Object Recognition Perceptual Computing Seminar Page 91
![Page 92: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/92.jpg)
Matching and ClassificationgFERN Implementation
To recognize a new patch the outputs selectsTo recognize a new patch the outputs selectsrows of distributions for each fern and theseare then combined assuming independenceare then combined assuming independencebetween distributions.
Visual Object Recognition Perceptual Computing Seminar Page 92
![Page 93: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/93.jpg)
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 93
![Page 94: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/94.jpg)
Matching and ClassificationgFERN Implementation
…in 10 lines of code….
1: for(int i = 0; i < H; i++) P[i ] = 0.;2: for(int k = 0; k < M; k++) {3: int index = 0, * d = D + k * 2 * S;4: for(int j = 0; j < S; j++) {5: index <<= 1;6: if (*(K + d[0]) < *(K + d[1]))7: index++;8: d += 2;
}9: p = PF + k * shift2 + index * shift1;10: for(int i = 0; i < H; i++) P[i] += p[i];
}
Visual Object Recognition Perceptual Computing Seminar Page 94
![Page 95: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/95.jpg)
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 95
![Page 96: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/96.jpg)
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 96
![Page 97: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/97.jpg)
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 97
![Page 98: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/98.jpg)
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 98
![Page 99: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/99.jpg)
Matching and Classificationg
The FERN technique speeds‐up keypointmatching but the training is slow andperformed offline.
Hence, it is not suited for applications thatrequire real‐time online learning orrequire real time online learning orincremental addition of arbitrary numbersof keypoints (f e SLAM)of keypoints (f.e. SLAM).
Visual Object Recognition Perceptual Computing Seminar Page 99
![Page 100: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/100.jpg)
Matching and Classificationg
This limitation can be removed if we train aFERN classifier to recognize a number ofkeypoints extracted from a referencedatabase and all other keypoints aredatabase and all other keypoints arecharacterized in terms of their response tothese classification ferns (signature)these classification ferns (signature).
Visual Object Recognition Perceptual Computing Seminar Page 100
![Page 101: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/101.jpg)
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 101
M. Calonder, V. Lepetit, and P. Fua, Keypoint Signatures for Fast Learning and Recognition. In Proceedings of European Conference on Computer Vision, 2008.
![Page 102: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/102.jpg)
Matching and Classificationg
It can be empirically shown that theseIt can be empirically shown that thesesignatures are stable under changes inviewing conditionsviewing conditions.
Signatures are sparse in nature if we apply aSignatures are sparse in nature if we apply athreshold function.
Signatures do not need a training phase andscale well with the number of classesscale well with the number of classes(nearest neighbor).
Visual Object Recognition Perceptual Computing Seminar Page 102
![Page 103: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/103.jpg)
Matching and Classificationg
However, matching signatures still involvesHowever, matching signatures still involvesmany more elementary operations thanabsolutely necessaryabsolutely necessary.
M l i h i iMoreover, evaluating the signatures requiresstoring many distributions of the same size asthemselves and, therefore, large amounts ofmemory.y
Visual Object Recognition Perceptual Computing Seminar Page 103
![Page 104: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/104.jpg)
Matching and Classificationg
The full response vector r(p) for all J Ferns is takenp (p)to be: Vectors storing the
probability that p is one of the N reference points
where Z is a normalizer s.t. its elements sum to one.
the N reference points.
In practice, when p truly corresponds to one of thereference keypoints r(p) contains one element that is closereference keypoints, r(p) contains one element that is closeto one where all others are close to zero.
Otherwise it contains a few relatively large values thatOtherwise, it contains a few relatively large values thatcorrespond to reference keypoints that are similar inappearance and small values elsewhere.
Visual Object Recognition Perceptual Computing Seminar Page 104
pp
![Page 105: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/105.jpg)
Matching and Classificationg
We can compute a sparse signature by applting ap p g y pp gpoint wise threshold function with a θ value.
It is an N‐dimensional vector with only a few non‐yzero elements that is mostly invariant to differentimaging conditions and therefore presents a usefulg g pdescriptor for matching purposes.
Visual Object Recognition Perceptual Computing Seminar Page 105
![Page 106: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/106.jpg)
Matching and ClassificationgThe patch
J Ferns
Vectors storingVectors storing the probability that p is one of the N reference points.
Typical parameters: J=50; d=10; N=500
Visual Object Recognition Perceptual Computing Seminar Page 106
J 50; d 10; N 500
![Page 107: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/107.jpg)
Matching and Classificationg
Typical parameters: J=50; d=10; N=500J 50; d 10; N 500
We need for each of the 2d leaves in each of the J Ferns an N‐dimensional vector of floatsdimensional vector of floats.
The total memory requirement is M=bJ2d N bytes, where b is thenumber of bytes to store a float (8) In practice 100MB!
Visual Object Recognition Perceptual Computing Seminar Page 107
number of bytes to store a float (8). In practice, 100MB!
![Page 108: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/108.jpg)
Matching and Classificationg
Compressive Sensing literature:Compressive Sensing literature:
• High‐dimensional sparse vectors can beg preconstructed from their linear projections intomuch lower‐dimensional spaces.p
• The Johnson–Lindenstrauss lemma states that all f h h d lsmall set of points in a high‐dimensional space can
be embedded into a space of much lowerdi i i h h di bdimension in such a way that distances betweenthe points are nearly preserved.
Visual Object Recognition Perceptual Computing Seminar Page 108
![Page 109: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/109.jpg)
Matching and Classificationg
Many kinds of matrices can be used for thisMany kinds of matrices can be used for thispurpouse.
Random Ortho‐Projection (ROP) matricesare a good choice and can be easilyconstructed by applying a Gram‐Schmidty pp y gorthonormalization process to a randommatrixmatrix.
Visual Object Recognition Perceptual Computing Seminar Page 109
![Page 110: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/110.jpg)
Matching and Classificationg
I th ti th G S h idt iIn mathematics the Gram–Schmidt process is amethod for orthonormalizing a set of vectors in
i d t t lan inner product space, most commonlythe Euclidean space Rn.
The Gram–Schmidt process takes a finite, linearlyi d d t t S { } f k ≤ dindependent set S = {v1, …, vk} for k ≤ n andgenerates an orthogonal set S' = {u1, …, uk} that
th k di i l b f Rn Sspans the same k‐dimensional subspace of Rn as S.
Visual Object Recognition Perceptual Computing Seminar Page 110
![Page 111: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/111.jpg)
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 111
M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, and P. Mihelich, Compact Signatures for High‐speed Interest Point Description and Matching. In Proceedings of International Conference on Computer Vision, 2009.
![Page 112: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/112.jpg)
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 112
M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, and P. Mihelich, Compact Signatures for High‐speed Interest Point Description and Matching. In Proceedings of International Conference on Computer Vision, 2009.
![Page 113: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/113.jpg)
Matching and Classificationg
Visual Object Recognition Perceptual Computing Seminar Page 113
M. Calonder, V. Lepetit, P. Fua, K. Konolige, J. Bowman, and P. Mihelich, Compact Signatures for High‐speed Interest Point Description and Matching. In Proceedings of International Conference on Computer Vision, 2009.
![Page 114: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/114.jpg)
Matching and Classificationg
This approach reduces the memory requirement whenstoring the models: for N=512, M=176, therequirements change from 93.75MB to 175B!The CPU time is 6.3ms per an exhaustive NN matchingof 256 points (256x256)
Visual Object Recognition Perceptual Computing Seminar Page 114
of 256 points (256x256).
![Page 115: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/115.jpg)
Internet‐scale image databasesg
Visual Object Recognition Perceptual Computing Seminar Page 115
![Page 116: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/116.jpg)
Min HASH
How can we find similar images inHow can we find similar images in very large datasets?
Can we get clusters from thesegimages?
Visual Object Recognition Perceptual Computing Seminar Page 116
![Page 117: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/117.jpg)
Min HASH
Let’s suppose that we choose a LARGE bag‐Let s suppose that we choose a LARGE bagof‐words representation of our images and that we use a binary histogramthat we use a binary histogram.
Visual Object Recognition Perceptual Computing Seminar Page 117
![Page 118: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/118.jpg)
Min HASH
Given two different images, we canGiven two different images, we cancompute their histogram intersection:
Visual Object Recognition Perceptual Computing Seminar Page 118
![Page 119: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/119.jpg)
Min HASH
…and their histogram union:…and their histogram union:
Visual Object Recognition Perceptual Computing Seminar Page 119
![Page 120: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/120.jpg)
Min HASH
Then we can define a set similarityThen we can define a set similaritymeasure in the following way:
That is, the number of times both images have a givenkeypoint in common divided by the total number ofkeypoint in common divided by the total number ofkeypoints that are present in both images.
Visual Object Recognition Perceptual Computing Seminar Page 120
![Page 121: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/121.jpg)
Min HASH
Visual Object Recognition Perceptual Computing Seminar Page 121
![Page 122: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/122.jpg)
Min HASHWe can perform clustering or matchingf d d f i h hof an unordered set of images with this
measure, but this can be used only witha limited amount of data!
The method requires
w
id2
similarity evaluations, where w is the size of the vocabulary and di is th b f i i d t
i
i1
the number of regions assigned to the i‐th visual word. Vocabulary commonly used is w=1 000 000
Visual Object Recognition Perceptual Computing Seminar Page 122
w=1.000.000.
![Page 123: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/123.jpg)
Min HASH
From can perform clustering orFrom can perform clustering ormatching of an unordered set of imageswith this measure but this can be usedwith this measure, but this can be usedonly with a limited amount of data!
Observation: histograms for angimage are highly sparse!
Visual Object Recognition Perceptual Computing Seminar Page 123
![Page 124: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/124.jpg)
Min HASH
The key idea of min‐hash is to mapThe key idea of min hash is to map(“hash”) each row/histogram to a smallamount of data Sig(A) (the signature)amount of data Sig(A) (the signature)such that:
• Sig(A) is small enough.• Rows A1 and A2 are highly similar ifSig(A1) is highly similar to Sig(A2).g 1 g y g 2
Visual Object Recognition Perceptual Computing Seminar Page 124
![Page 125: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/125.jpg)
Min HASH
Useful convention: we will refer to columns asbeing of four types:
A1: 1 0 1 01A2: 1 1 0 0Type: a b c dyp
We will also use “a” as the number of columns of type a. yp
Notes: • Sim (A1 A2)=a/(a+b+c)Sim (A1 , A2)=a/(a+b+c)• Most columns are type d.
Visual Object Recognition Perceptual Computing Seminar Page 125
![Page 126: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/126.jpg)
Min HASH• Imagine the columns permuted randomly indorder.
• Hash each row A to h(A), the number of thefi l i hi h hfirst column in which row A has a 1.
h(A ) 21 0 0 1 0
1 0 0 0 0
0 1 0 0 1
0 1 0 0 0
π h(A1)=2
h(A2)=2
The probability that h(A1) = h(A2) is1 2a/(a+b+c) = Sim (A1 , A2) (the hash agree if thefirst column with a 1 is a and disagree if it is of type b or c).
Visual Object Recognition Perceptual Computing Seminar Page 126
![Page 127: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/127.jpg)
Min HASHIf we repeat the experiment with a new
f l l b fpermutation of columns a large number oftimes, say 512, we get a signatureconsisting of 512 column numbers for eachrow.row.
The “similarity” of these lists (fraction ofpositions in which they agree) will be veryclose to the similarity of the rows (=close to the similarity of the rows (similar signatures mean similar rows!).
Visual Object Recognition Perceptual Computing Seminar Page 127
![Page 128: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/128.jpg)
Min HASHIn fact, it is not necessary to permute the columns: wecan hash each original column with 512 different hashcan hash each original column with 512 different hashfunctions and keep for each row the lowest hash value ofa row in which that column has a 1, independently foreach of the 512 hash functions. Then we look for thecoincidences.
1 0 0 1 0rowsignature
5 1 3 2 4
1 2 5 3 4
3 4 1 5 2
h1
h2h
h1(row)= 2
h2(row)= 1
h (row)= 33 4 1 5 2
2 5 4 1 3
h3h4
h3(row)= 3
h4(row)= 1
Visual Object Recognition Perceptual Computing Seminar Page 128
![Page 129: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/129.jpg)
Min HASH
1 0 1 1 0
0 1 0 0 1
1 1 0 1 0
Row 1
Row 2
R 3 1 1 0 1 0
1 2 3 4 5
5 4 3 2 1
h1
h
h1(row)= 1 , 2 , 1
h2(row)= 2 1 2
Row 3
3 4 5 1 2h2h3
h2(row) 2 , 1 , 2
h3(row)= 1 , 2 , 1
Similarities:
Row‐Row Sig‐SigRow Row Sig Sig1‐2: 0/5 0/31‐3: 2/4 3/32‐3: 1/4 0/3
Visual Object Recognition Perceptual Computing Seminar Page 129
/ /
![Page 130: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/130.jpg)
Min Hash
For efficient retrieval, the min hashes aregrouped into n‐tuples. In this example, we canform the following 2‐tuples:
h1(row)= 1 , 2 , 1h (row)= 2 1 2h2(row)= 2 , 1 , 2 h3(row)= 1 , 2 , 1h4(row)= 3 , 2 , 3
The retrieval procedure then estimates the full
h4(row) 3 , 2 , 3
similarity for only those image pairs that have atleast h identical tuples out of k tuples.
Visual Object Recognition Perceptual Computing Seminar Page 130
![Page 131: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/131.jpg)
Min Hash
From 100k imagesFrom 100k images....
Visual Object Recognition Perceptual Computing Seminar Page 131
![Page 132: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/132.jpg)
Min Hash
From 100k imagesFrom 100k images....
Visual Object Recognition Perceptual Computing Seminar Page 132
![Page 133: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/133.jpg)
Min Hash
From 100k imagesFrom 100k images....
Representatives of the largest clusters
Visual Object Recognition Perceptual Computing Seminar Page 133
![Page 134: Pc Seminar Jordi](https://reader034.vdocuments.pub/reader034/viewer/2022051109/549843e1b47959424d8b541e/html5/thumbnails/134.jpg)
Min Hash
Automatic localization of different buildings
Visual Object Recognition Perceptual Computing Seminar Page 134