monga paper 1

8/3/2019 Monga paper 1

1/14

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 11, NOVEMBER 2006 3453

Perceptual Image Hashing Via Feature Points:Performance Evaluation and Tradeoffs

Vishal Monga , Member, IEEE , and Brian L. Evans , Senior Member, IEEE

Abstract We propose an image hashing paradigm using visu-ally signicant feature points. The feature points should be largelyinvariant under perceptually insignicant distortions. To satisfythis, we propose an iterative feature detector to extract signi-cant geometry preserving feature points. We apply probabilisticquantization on the derived features to introduce randomness,which, in turn, reduces vulnerability to adversarial attacks. Theproposed hash algorithm withstands standard benchmark (e.g.,Stirmark) attacks, including compression, geometric distortionsof scaling and small-angle rotation, and common signal-processingoperations. Content changing (malicious) manipulations of imagedata are also accurately detected. Detailed statistical analysis

in the form of receiver operating characteristic (ROC) curvesis presented and reveals the success of the proposed scheme inachieving perceptual robustness while avoiding misclassication.

Index Terms Feature extraction, hashing, image authentica-tion, image indexing.

I. INTRODUCTION

I N cryptography, hash functions are typically used for digitalsignatures to authenticate the message being sent so that therecipient can verify its source. A key feature of conventionalhashing algorithms such as MD5 and SHA-1 is that they areextremely sensitive to the message [1], i.e., a 1-bit change inthe input changes the output dramatically. Data such as digitalimages, however, undergo various manipulations such as com-pression and enhancement. An image hash function takes intoaccount changes in the visual domain. In particular, a percep-tual image hash function should have the property that two im-ages that look the same to the human eye map to the same hashvalue, even if the images have different digital representations,e.g., separated by a large distance in mean-squared error.

An immediately obvious application for a perceptual imagehash is identication/search of images in large databases. Sev-eral other applications have been identied recently in contentauthentication, watermarking [2], andanti-piracy search. Unliketraditional search, these scenarios are adversarial, and requirethe hash to be a randomized digest.

Manuscript received May 12, 2005; revised April 13, 3006. This work wassupported by a gift from the Xerox Foundation and an equipment donation fromIntel. The associate editor coordinating the review of this manuscript and ap-proving it for publication was Dr. Ron Kimmel.

V. Monga is with the Xerox Innovation Group, El Segundo, CA 90245 USA(e-mail: [email protected]).

B. L. Evans is with the Embedded Signal Processing Laboratory, Departmentof Electrical and Computer Engineering, The University of Texas at Austin,Austin, TX 78712 USA (e-mail: [email protected]).

Digital Object Identier 10.1109/TIP.2006.881948

The underlying techniques for constructing image hashes canroughly be classied into methods based on image statistics[3][5] , relations [6], [7], preservation of coarse image repre-sentation [8][10], and low-level image feature extraction [11],[12].

The approaches in [3] and [4] compute statistics such asmean, variance, and higher moments of intensity values of image blocks. They conjecture that such statistics have goodrobustness properties under small perturbations to the image.A serious drawback with these methods is that it is easy to

modify an image without altering its intensity histogram. This jeopardizes the security properties of any scheme that relieson intensity statistics. Venkatesan et al. [5] develop an imagehash based on an image statistics vector extracted from thevarious subbands in a wavelet decomposition of the image.They observe that statistics such as averages of coarse subbandsand variances of other (ne detail) subbands stay invariantunder a large class of content-preserving modications to theimage. Although statistics of wavelet coefcients have beenfound to be far more robust than intensity statistics, they do notnecessarily capture content changes 1 well, particularly thosethat are maliciously generated.

A typical relation-based technique for image authentication,which tolerates JPEG compression, has been reported by Linand Chang [6]. They extract a digital signature by using theinvariant relationship between any two discrete cosine trans-form (DCT) coefcients, which are at the same position of twodifferent 8 8 blocks. They found these invariance propertiescould be preserved before and after JPEG compression as longas it is perceptually lossless. This scheme, although robust toJPEG compression, remains vulnerable to several other percep-tually insignicant modications, e.g., where the statistical na-ture of distortion is different from the blur caused by compres-sion. Recently, Lu et al. [7] have proposed a structural digitalsignature for image authentication. They observe that in a sub-

band wavelet decomposition, a parent and child node are uncor-related, but they are statistically dependent. In particular, theyobserve that the difference of the magnitude of wavelet coef-cients at consecutive scales (i.e., a parent node and its four childnodes) remains largely preserved for several content-preservingmanipulations. Identifying such parent-child pairs and subse-quently encoding the pairs form their robust digital signature.Their scheme, however, is very sensitive to global (e.g., small

1A contentchange here signiesa perceptually meaningful perturbation to theimage, e.g., adding/removing an object, signicant change in texture, morphinga face etc. In general, a perceptual hash should be sensitive to both incidental aswell as malicious content changes. A major challenge in secure image hashingis to develop algorithms that can detect (with high probability) malicious tam-

pering of image data.

1057-7149/$20.00 2006 IEEE


2/14

3454 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 11, NOVEMBER 2006

rotation and bending) as well as local (Stirmark) geometric dis-tortions, which do not cause perceptually signi cant chang es tothe image.

Fridrich and Goljan [8] propose a robust hash base d onpreserving selected (low-frequency) DCT coef cients. T heirmethod is based on the observation that large change s to

low frequency DCT coef cients of the image are neede d tochange the appearance of the image signi cantly. Mihcak andVenkatesan [9] develop another image hashing algorithm byusing an iterative approach to binarize the DC subband (lo westresolution wavelet coef cients) in a wavelet decompositio n of the image. Very recently, Kozat et al. [10] proposed an im agehashing scheme by retaining the strongest singular ve ctorsand values in a singular value decomposition (SVD) of im ageblocks. Approaches based on coarse image representat ions[8][10] have been shown to possess excellent robustness u nderperceptually insigni cant modi cations to the image, but theyremain vulnerable to local tampering or content changes.

In this paper, we present a framework for perce ptualimage hashing using feature points. Feature point dete ctorsare attractive for their inherent sensitivity to content chan gingmanipulations. Current approaches based on feature p oints[11] , [12] have limited utility in perceptual hashing applicat ionsbecause they are sensitive to several perceptually insigni cantmodi cations as well. We propose to extract signi cant im agefeatures by using a wavelet based feature detection algor ithmbased on the characteristics of the visual system [13] . Fur ther,an iterative procedure based on observations in [9] is usedto lock onto a set of image feature-points with exce llentinvariance properties to perceptually insigni cant perturbati ons.Unlike the use of public-key encryption schemes in [11] , [12]

probabilistic quantization is used to binarize the extra ctedfeature vector.

We develop both deterministic as well as randomized has h al-gorithms. Randomization has been identi ed [5] , [9], [10] to besigni cant for lending unpredictability to the hash and, he nce,reducing the vulnerability of the hash algorithm to maliciou s in-puts generated by an adversary.

Our hash exhibits excellent robustness under bench mark attacks (e.g., Stirmark) including compression, geom etricdistortions of scaling and small angle rotation, and com monsignal processing operations while successfully discrimina tingbetween perceptually distinct images.

Section II-A formally de nes the desired propertie s of a perceptual image hash. Section II-B then presents a n oveltwo-stage framework for perceptual image hashing that con sistsof feature extraction followed by feature vector compres sion.We call the output of the rst stage an intermediate hash. Therest of the paper then focuses on building such interme diatehash vectors. Section III presents a robust feature detector b asedon visually signi cant end-stopped wavelets [14]. Sectio n IVpresents a probabilistic quantization approach to binarize im agefeature vectors that enhances robustness, and at the same t ime,introduces randomness. Iterative algorithms (both determin isticand randomized) that construct intermediate hash vectors aredescribed in Section V . Experimental results demonstra ting

perceptual robustness, sensitivity to content changes, and ROCanalysis across 1000 different images are reported in Sec-

tions VI-A D. Concluding remarks and suggestions for futurework are c ollected in Section VII .

II. A U NIFYING FRAMEWORK FOR PERCEPTUAL HASHING

A. Percept ual Image Hash: Desired Properties

Section I describes many possible applications for an imagehash. The hash algorithm developed in this paper, however, isspeci cally targeted at two scenarios: 1) content authenticationand 2) anti -piracy search.

The form er requires that the hash should remain invariant if the image u ndergoes a content preserving (even though lossy)transforma tion. Then, a robust hash can serve as substitute fora robust wa termark meant to be retained as long as the image isnot signi cantly changed. The difference with watermarking isthat hashin g is passive and does not require any embedding intothe image. Anti-piracy search is aimed at thwarting a pirate orattacker wh o may claim ownership of proprietary image content

bysuitably modifyingit.Asanexampleconsiderapirate/attackerwho may st eal images from an of cial website and post them ashisownby compressing themor sending themthrough geometricdistortions such as print scan. Since comparing with all imageson the web is impractical, the true/of cial content owner mayemploy a w eb-crawler software that computes hashes of imageson random ly accessed webpages. If the hashes match those of their own i mage content, pirates can be caught.

In view of the above, we develop formally, the desired prop-erties of a perceptual image hash. Let denote a set of images(e.g., all na tural images of a particular size) with nite cardi-nality. Als o, let denote the space of secret keys. 2 Our hashfunction th en takes two inputs, an image and a secret key

, to produce a -bit binary hash value .Let denote an image such that looks the sameas . Likew ise, an image in , perceptually distinct from willbe denoted by . Let satisfy : Then,three desira ble properties of a perceptual hash are identi ed asfollows.

1) Perce ptual robustness :

2) Fragi lity to visually distinct images :

3) Unpr edictability of the hash :

Let ; i.e., the set of allpossible re alizations of the hash algorithm on the product space

. Also ,fora xed de ne ;i.e., for a xed image the set of all possible realizations of thehash algori thm over the key space .

2The key sp acein general can be constructed in several ways. A necessary butnot suf cient condition for secure hashing is that the key space should be largeenough to pre clude exhaustive search. For this paper, unless speci ed explicitlywe will assum e the key space to be the hamming space of 32-bit binary strings.


3/14

MONGA AND EVANS: PERCEPTUAL IMAGE HASHING VIA FEATURE POINTS 3455

Fig. 1. Block diagram of the Hash function.

Note that the probability measure in the rst two properties isdened over the set . For example, property 1 requires that forany pair of perceptually identical images in and anythe hash values must be identical with high probability. Theprobability measure in the third property, however, is de nedon ; i.e., the third property requires that as the secret key isvaried over for a xed input image; the output hash value mustbe approximately uniformly distributed among all possible -bitoutputs.

Further, the three desirable hash properties con ict with oneanother. The rst property amounts to robustness under small

perturbations whereas the second one requires minimization of collision probabilities for perceptually distinct inputs. There isclearly a tradeoff here; e.g., if very crude features were used,then they would be hard to change (i.e., robust), but it is likelythat one is going to encounter collision of perceptually differentimages. Likewise for perfect randomization, a uniform distribu-tion on the output hash values (over the key space) would beneeded which in general, would deter achieving the rst prop-erty. From a security viewpoint, the second and third propertiesare very important; i.e., it must be extremely dif cult for the ad-versary to manipulate the content of an image and yet obtain thesame hash value. It is desirable for the hash algorithm to achievethese (con icting) goals to some extent and/or facilitate trade-offs.

B. Hashing Framework

We partition the problem of deriving an image hash into twosteps, as illustrated in Fig. 1 . The rst step extracts a featurevector from the image, whereas the second stage compressesthis feature vector to a nal hash value. In the feature extractionstep, the 2-D image is mapped to a 1-D feature vector. This fea-ture vector must capture the perceptual qualities of the image.That is, two images that appear identical to the human visualsystem should have feature vectors that are close in some dis-

tance metric. Likewise, two images that are clearly distinct inappearance must have feature vectors that differ by a large dis-tance. For the rest of the paper, we will refer to this visuallyrobust feature vector (or its quantized version) as the interme-diate hash. The second step then compresses this intermediatehash vector to a nal hash value.

This paper focuses on a feature-point based solution to stage 1of the hash algorithm. Solutions for stage 2 have been proposedin [5] and [15] [17] .

III. F EATURE EXTRACTION

For the perceptual hashing problem, two major attributes of the feature detector are identi ed: 1) generality and 2) robust-ness. The generality criterion addresses the issue of whether or

not the feature detector can be used over a variety of imagesand applications. Robustness is desirable for the features to beretained in perceptually identical images.

Feature detectioncontinues to be an importantvision problemand a plethora of algorithms have been reported. For an exten-sive coverage of feature detection, the reader is referred to [18].In this section, we review selected and best known robust in-terest point detectors that are particularly well suited for thehashing problem. We also propose our feature detection methodand experimentally evaluate it against existing approaches.

A. Harris Detector Possibly the best known corner detector, the Harris detector

[19] uses differential features of the image. The construction of the detector is based on a corner detection created by Moravec[20]. The response function is calculated for a shiftfrom the central point

(1)

where represents the luminance of the image at the coor-

dinate , and the function represents a rectangularGaussian response window centered at . In essence, thiscorner detector functions by considering a local window in theimage(represented by the spanof ),anddetermining theaverage changes of image intensity by shifting the window bysmall amounts in various directions. A corner can, thus, be de-tected when the change produced by any of the shifts (i.e., in allpossible directions) is large.

Harris reformulated the detection function using a matricialformulation. Let

(2)

(3)

The detection function is then given by and

(4)

may, hence, be interpreted as the auto-correlation of theimage with the shape function . Harris and Stephens gavea new de nition of the detector function using the eigenvalues

and of the matrix . These values are invariant by rota-tion and if their magnitudes are high, the local auto-correlationfunction is represented by a local peak. To avoid computing the


4/14


eigenvalue of , the new criterion is based on the trace and de-terminant of

(5)

where is an arbitrary constant. Feature points extraction isachieved by applying a threshold on the response andsearching for local maxima.

B. Hessian Afne

A similar idea is explored in the detector based on the Hessianmatrix [21] . The difference is in the matrix which is nowgiven by

(6)

where and .The second derivatives, which are used in this matrix give

strong responses on blobs and ridges. The interest points aresimilar to those detected by a Laplacian operator but a functionbased on the determinant of the Hessian matrix [21] penalizesvery long structures for which the second derivative in one par-ticular orientation is very small. A local maximum of the deter-minant indicates the presence of a bob structure.

C. Maximally Stable Extremal Region (MSER) Detector

A maximally stable extremal region (MSER) [22] is a con-nected component of an appropriately thresholded image. Theword extremal refers to the property that all pixels inside theMSER have either higher (bright extremal regions) or lower(dark extremal regions) intensity than all the pixels on its outerboundary. The maximally stable in MSER describes the prop-erty optimized in the threshold selection process.

The set of extremal regions , i.e., the set of all connectedcomponents obtained by thresholding, has a number of desir-able properties. First, a monotonic change of image intensitiesleaves unchanged, since it depends only on the ordering of pixel intensities which is preserved under monotonic transfor-

mation. This ensures that common photometric changes mod-elled locally as linear or af ne leave unaffected. Secondly,continuous geometric transformations preserve topology-pixelsfrom a single connected component are transformed to a singleconnected component. Thus, after a geometric change locallyapproximated by an af ne transform, homography or even con-tinuous nonlinear warping, a matching extremal region will bein the transformed set . Finally, there are no more extremalregions than there are pixels in the image. So, a set of regionswas de ned that is preserved under a broad class of geometricand photometric changes and yet has the same cardinality as,e.g., the set of xed-sized square windows commonly used innarrow-baseline matching.

For a detailed description of particular techniques to selectextremal regions, the reader is referred to [22] .

D. Feature Detection Based on End-Stopping Behavior of theVisual System

1) End-Stopped Wavelets: Psychovisual studies have identi-ed the presence of certain cells, called hypercomplex or end-stopped cells, in the primary visual cortex [13]. For real-worldscenes, these cells respond strongly to extremely robust image

features such as corner like stimuli and points of high curva-ture [23] , [14] . The term end-stopped comes from the strongsensitivity of these cells to end-points of linear structures. Bhat-tacherjee et al. [14] construct end-stopped wavelets to cap-ture this behavior. The construction of the wavelet kernel (orbasis function) combines two operations. First, linear structureshaving a certain orientation are selected. These linear structuresare then processed to detect line-ends (corners) and or high cur-vature points.

Morlet wavelets can be used to detect linear structures havinga speci c orientation. In the spatial domain, the 2-D Morletwavelet is given by [24]

(7)

where represents 2-D spatial coordinates, andis the wave-vector of the mother wavelet, which deter-

mines scale-resolving power andangular-resolving power of thewavelet [24] . The frequency domain representation of aMorlet wavelet is

(8)

Here, represents the 2-D frequency variable . TheMorlet function is similar to the Gabor function, but with anextra correction term to make it an admis-sible wavelet. The orientation of the wave-vector determinesthe orientation tuning of the lter. A Morlet wavelet detectslinear structures oriented perpendicular to the orientation of thewavelet.

In two dimensions, the end points of linear structures can bedetected by applying the rst derivative of Gaussian (FDoG)lter in the direction parallel to the orientation of structures inquestion. The rst ltering stage detects lines having a speci corientation and the second ltering stage detects end-points of

such lines. These two stages can be combined into a single lterto form an end-stopped wavelet [14] . An example of an end-stopped wavelet and its 2-D Fourier transform is as follows:

(9)

(10)

Equation (10) shows as a product of two factors. The rstfactor is a Morlet wavelet oriented along the axis. The secondfactor is a FDoG operator applied along the frequency axis

, i.e., in the direction perpendicular to the Morlet wavelet.Hence, this wavelet detects line ends and high curvature points


5/14


Fig. 2. Behavior of the end-stopped wavelet on a synthetic image: note the strong response to lineends and corners. (a) Synthetic L-shaped image. (b) Responseof a Morlet wavelet, orientation = 0 . (c) Response of the end-stopped wavelet.

in the vertical direction. Fig. 2 illustrates the behavior of theend-stopped wavelet as in (9) and (10) . Fig. 2(a) shows a syn-thetic image with the L-shaped region surrounded by a black background. Fig. 2(b) shows the raw response of the verticallyoriented Morlet wavelet at scale . Note that this waveletresponds only to the vertical edges in the input. The responseof the end-stopped wavelet is shown in Fig. 2(c) also at scale

. The responses are strongest at end-points of verticalstructures and negligibly small elsewhere. The local maximaof these responses in general correspond to corner-like stimuliand high curvature points in images.

2) Proposed Feature Detection Method: Our approach tofeature detection computes a wavelet transform based on anend-stopped wavelet obtained by applying the FDoG operatorto the Morlet wavelet

(11)

Orientation tuning is given by . Let theorientation range be discretized into intervals and thescale parameter be sampled exponentially as . Thisresults in the wavelet family

(12)

where . The wavelet transform

is

(13)

The sampling parameter is chosen to be 2.Feature detection method that preserves the signi cant image

geometry feature points of an image:1) Compute the wavelet transform in (13) at a suitably chosen

scale for several different orientations. The coarsest scale( ) is not selected as it is too sensitive to global varia-tions. The ner the scale, the more sensitive it is to distor-tions, such as quantization noise. We choose .

2) Locations ( ) in the image that are identi ed as candi-date feature points satisfy

(14)

where represents the local neighborhood of ( )within which the search is conducted.

3) From the candidate points selected in step 2), qualify alocation as a nal feature point if

(15)

where is a user-de ned threshold.

In the the proposed feature detection method, Step 1) com-putes the wavelet transform in (13) for each image location. Step2) identi es signi cant features by looking for local maxima of the magnitude of the wavelet coef cients in a preselected neigh-borhood. We chose a circular neighborhood to avoid increasingdetector anisotropy. Step 3) applies thresholding to eliminatesspurious local maxima in featureless regions of the image.

The feature detection methodhas twofree parameters: integerscale and real threshold . The threshold is adapted to se-lect a xed number (user de ned parameter ) of feature pointsfrom the image. An image feature vector is formed by collectingthe magnitudes of the wavelet coef cients at the selected featurepoints.

E. Detector Evaluation

To evaluate a detector for the robust hashing application, weemploy a robustness function de ned as

(16)

where represents the total number of feature points detected in the original image. denotes thenumber of feature points after the image goes a content pre-serving (but lossy) transformation. Similarly, representsthe number of feature points that are removed as a result of thetransformation, and is the number of new feature points


6/14


Fig. 3. Detector benchmark R for the four detectors across four different images. Note, the closer R is to 1, the better the performance. (Color versionavailable online at http://ieeexplore.ieee.org.)

that may be introduced. Clearly, the maximum value of thisfunction is 1 which happens when .

We obtained this function for the four feature point detectorsreviewed above under a large class of perceptually insigni cantand typically allowable transformations including JPEG imagecompression, rotation, scaling, additive white Gaussian noiseaddition, print-scan and linear/nonlinear image ltering opera-tions. Then an averaged evaluation measure was obtainedby averaging the values of from each transformation.

A plot of for tests on four different images is plotted forthe four detectors in Fig. 3 . Clearly, the best values of areobtained for the feature detector based on end-stopped waveletsfollowed by MSER. In practice, we observed that while mostdetectors fared comparably under geometric attacks of rotation,scaling and translation, the real gains of the end-stopped waveletbased detector were under lossy transformations such as com-pression, noise addition and nonlinear ltering.

IV. P ROBABILISTIC QUANTIZATION

Let denote the length feature vector. The next step then isto obtain a binary string from the feature vector that would formthe intermediate hash. Previous approaches have [3] , [11] usedpublic-key encryption methods on image features to arrive at adigital (binary) signature. Such a signature would be very sen-sitive to small perturbations in the extracted features (here, themagnitude of the wavelet coef cients). We observe that underperceptually insigni cant distortions to the image, although theactual magnitudes of the waveletcoef cientsassociated with thefeature points may change, the distribution of the magnitudesof the wavelet coef cients is still preserved.

In order to maintain robustness, we propose a quantizationscheme based on the probability distribution of the features ex-tracted from the image. In particular, we use the normalized his-togram of the feature vector as an estimate of its distribution.The normalized histogram appears to be largely insensitive toattacks that do not cause signi cant perceptual changes. In ad-dition, a randomization rule [25] is also speci ed which addsunpredictability to the quantizer output.

Let be the number of quantization levels, denote thequantized version of and denote the th elementsof and , respectively. The binary string obtained from thequantized feature vector is, hence, of length bits.If quantization were deterministic, the quantization rule wouldbe given by

(17)

where is the th quantization bin. Note, the quantizedvalues are chosen to be . This is because un-like traditional quantization for compression, there is no con-straint on the quantization levels for the hashing problem. Thesemay, hence, be designed to convenience as long as the notionof closeness is preserved. Here, we design quantization bins

such that

(18)

where is the estimated distribution of . This ensures thatthe quantization levels are selected according to the distribution


7/14


of image features. In each interval , we obtain centerpoints with respect to the distribution, given by

(19)

Then, we nd deviations about whereand , such that

(20)

are, hence, symmetric around with respect to the dis-tribution . By virtue of the design of s in (19) , the de-nominators in (20) are both equal to and, hence, onlythe numerators need to be computed. The probabilistic quanti-zation rule is then completely given by (21) and (22), shown atthe bottom of the page.The output of the quantizer is determin-istic except in the interval Note, if for some

then the assignment to levels or takes place with equalprobability, i.e., 0.5, the quantizer output in other words is com-pletely randomized. On the other hand, as approachesor the quantization decision becomes almost deterministic.

In the next section, we present iterative algorithms that em-ploy the feature detector in Section III-D2 , and the quantizationscheme described in this section to construct binary interme-diate hash vectors.

V. INTERMEDIATE HASH ALGORITHMS

A. Algorithm 1 DeterministicThe intermediate hash function for image is represented as

and let denote the normalized Hamming distancebetween its arguments (binary strings).

Mihcak et al. [9] observe that primary geometric features of the image are largely invariant under small perturbations to theimage. They propose an iterative ltering scheme that mini-mizes the presence of geometrically weak components andenhances geometrically strong components by means of re-gion growing . We adapt the algorithm in [9] to lock onto a set of feature-points that are largely preserved in perceptually similarversions of the image. The stopping criterion for our proposediterative algorithm is achieving a xed point for the binary stringobtained on quantizing the vector of feature points .

The deterministic intermediate hash algorithm is describednext.

Deterministic intermediate hash algorithm.1) Get parameters and (number of features),

and set .2) Use the feature detector proposed in the feature detection

method to extract the length of feature vector .3) Quantize according to the rule given by (17) and (18) (i.e.,deterministic quantization) to obtain a binary string .

4) (Perform order-statistics ltering) Letwhich is the 2-D order statistics ltering of the input . Fora 2-D input where isequal to the element of thesorted set of where

and. Note, for , this is the

same as median ltering.5) Perform low-pass linear shift invariant ltering on to

obtain .6) Repeat steps 2) and 3) with to obtain .7) If , go to step 8);

else if , go to step 8);else set and go to step 2).

8) Set .Step 4) eliminates isolated signi cant components. Step 5)

preserves the geometrically strong components by low-passltering (which introduces blurred regions). The success of thedeterministic algorithm relies upon the self-correcting nature of the iterative algorithm as well as the robustness of the featuredetector. The above iterative algorithm is fairly general in thatany feature detector that extracts visually robust image featuresmay be used.

B. Algorithm 2 Randomized

Randomizing the hash output is desirable not only for secu-rity against inputs designed by an adversary (malicious attacks),but also for scalability, i.e., the ability to work with large datasets while keeping the collision probability for distinct inputs incheck. The hash algorithm presented in Section V-A does notmake use of a secret key, and, hence, there is no randomness in-volved.

In this section, we will construct randomized hash algorithmsusing a secret key , which is used as the seed to the pseudo-random number generator for the randomization steps in the al-gorithm. For this reason, we now denote the intermediate hashvectoras , i.e., functionof both the image, andthe secret

with probability

with probability

with probability (21)

with probability (22)


8/14


Fig. 4. Examples of random partitioning of the Lena image into N = 1 3 rectangles. Note that the random regions vary signi cantly based on the secret key.(a) Secret key K . (b) Secret key K . (c) Secret key K .

key. We present a scheme that employs a random partitioningof the image to introduce unpredictability in the hash values. A

stepwise description is given next. Randomized intermediate hash algorithm.1) (Random Partitioning) Divide the image into (overlap-

ping) random regions. In general, this can be done in sev-eral ways. The main criterion is that a different partitioningshould be obtained (with high probability) as the secret keyis varied. In our implementation, we divide the image intooverlapping circular/elliptical regions with randomly se-lected radii. Label these regions as .

2) (Regtangularization) Approximate each by a rectangleusing a water lling like approach. Label the resultingrandom rectangles (consistent with the labels in step 1) as

.3) (Feature Extraction) Apply Algorithm 1 on all , denote

the binary string extracted from each as . Concatenateall s into a single binary vector of length bits.

4) (Randomized Subspace Projection) Let be the de-sired length of . Randomly choose distinct indices

such that each .5) The intermediate hash .Qualitatively, Algorithm 2 enhances the security of the hash

by employing Algorithm 1 3 on randomly chosen regions orsubimages. As long as these subimages are suf ciently un-predictable (i.e., they differ signi cantly as the secret key isvaried), then the resulting intermediate hashes are also different

with high probability. 4 Examples of random partitioning of the Lena image using Algorithm 2, are shown in Fig. 4 . In eachcase, i.e., Fig.4(a) (c) , a different secret key was used.

The approach of dividing the image into random rectanglesfor constructing hashes was rst proposed by Venkatesan et al.in [5]. However, their algorithm is based on image statistics. Inour framework, by applying the feature point detector to thesesemi-global rectangles, we gain an additional advantage in cap-turing any local tampering of image data (results presented later

3This would now use a probabilistic quantizer.4Although we do not implement it, as suggested by an anonymous reviewer

another avenue for randomization is to employ the feature detector proposed in

the feature detection method on randomly chosen wavelet scales. In that case,however, the choice of wavelet coef cients (from different scales) should bedone very carefully to retain necessary hash robustness.

in Section VI-B ). These rectangles in Fig. 4 are deliberatelychosen to be overlapping to further reduce the vulnerability of

the algorithm to malicious tampering. Finally, the randomizedsubspace projection step adds even more unpredictability to theintermediate hash. Tradeoffs among randomization, fragility,and perceptual robustness are analyzed later in Section VI-C .

VI. R ESULTS

We compare the binary intermediate hash vectors obtainedfrom two different images for closeness in (normalized) Ham-ming distance. Recall from Section II-A thatdenote a pair of perceptually identical images, and likewise

represent perceptually distinct images. Then, werequire

(23)

(24)

where the natural constraints apply. For resultspresented in Section VI-A and B, the following parameterswere chosen for Algorithm 1: a circular (search) neighborhoodof three pixels was used in the feature detector, fea-tures were extracted, the order statistics ltering wasand a zero-phase 2-D FIR low-pass lter of size 5 5 designedusing McClellan transformations [26] was employed. For Algo-rithm 2, the same parameters were used except that the image

was partitioned into random regions. For this choice of parameters, we experimentally determine , and .A more elaborate discussion of how to choose the best , andwill be given in Section VI-D . All input images were resized to512 512 using bicubic interpolation [27] . For color images,both Algorithm 1 and 2 were applied to the luminance planesince it contains most of the geometric information.

A. Robustness Under Perceptually Insigni cant Modi cations

Fig. 5(a) (d) shows four perceptually identical images. Theextracted feature points at algorithm convergence are overlayedon the images. The original bridge image is shown in Fig. 5(a) .Fig. 5(b) (d) , respectively, is the image in Fig. 5(a) attacked byJPEG compression with quality factor (QF) of 20, rotation of 2with scaling, and the Stirmark local geometric attack [28] . It can


9/14


Fig. 5. Original/attacked images with feature points at algorithm convergence. Feature points overlaid on images. (a) Original Image. (b) JPEG, Q F = 1 0 . (c) 2rotation and scaling. (d) Stirmark local geometric attack.

be seen that the features extracted from these images are largelyinvariant.

Table I then tabulates the quantitative deviation as the normal-ized Hamming distance between the intermediate hash valuesof the original and manipulated images for various perceptu-ally insigni cant distortions. The distorted images were gener-ated using the Stirmark benchmark software [28] . The results inTable I reveal that the deviation is less than 0.2 except for largerotation (greater than 5 ), and cropping (more than 20%). Moresevere geometric attacks like large rotation, af ne transforma-tions, and cropping can be handled by a search based scheme.

Details may be found in [29].

B. Fragility To Content Changes

The essence of our feature point based hashing scheme liesin projecting the image onto a visually meaningful waveletbasis, and then retaining the strongest coef cients to form thecontent descriptor (or hash). Our particular choice of the basisfunctions,i.e., end-stopped typeexponentialkernels,yield strongresponses in parts of the image where the signi cant imagegeometry lies. It is this very characteristic that makes ourschemeattractive for detecting content changing image manipulations.In particular, we observe that a visually meaningful contentchange is effected by making a signi cant change to the imagegeometry.

TABLE INORMALIZED HAMMING DISTANCE BETWEEN INTERMEDIATE HASH VALUES

OF ORIGINAL AND ATTACKED (PERCEPTUALLY IDENTICAL ) IMAGES

Fig. 5 shows two examples of malicious content changing ma-nipulation of image data and the response of our feature ex-


10/14


TABLE IINORMALIZED HAMMING DISTANCE BETWEEN INTERMEDIATE

HASH VALUES OF ORIGINAL AND ATTACKED IMAGESVIA CONTENT CHANGING MANIPULATIONS

tractor to those manipulations. Fig. 5(a) shows the original toysimage. Fig. 5(b) then shows a tampered version of the image inFig. 5(a) , where the tampering is being brought about by addi-tion of a toy bus. In Fig. 5(d) , an example of malicious tam-pering is shown where the face of the lady in Fig. 5(c) has beenreplaced by a different face from an altogether different image.

Comparing Fig. 5(a) and (b) and Fig. 5(c) and (d) , it may beseen that several extracted features do not match. This obser-vation is natural since our algorithm is based on extracting the

strongest geometric features from the image. In particular, inFig. 5(d) , tampering of the lady s face is easily detected sincemost differences from Fig. 5(c) are seen in that region. Quan-titatively, this translates into a large distance between the inter-mediate hash vectors.

With complete knowledge of the iterative feature extractionalgorithm, it may still be possible for a malicious adversary togenerate inputs (pairs of images) that defeat our hash algorithm,e.g., tamper content in a manner such that the resulting fea-tures/intermediate hashes are still close. This, however, is muchharder to achieve, when the randomized hash algorithm (Algo-rithm 2) were used.

We also tested under several other content changing attacksincluding object insertion and removal, addition of excessivenoise, alteration of the position of image elements, tamperingwith facial features, and alteration of a signi cant image char-acteristic such as texture and structure. In all cases, the detectionwas accurate. That is, the normalized Hamming distance be-tween the image and its attacked version was found to be greaterthan 0.3. Table II shows the normalized Hamming distance be-tween intermediate hash values of original and maliciously tam-pered images for many different content changing attacks. Al-

gorithm 2 with was used for these results.

C. Performance Tradeoffs

A large search neighborhood implies the maxima of waveletresponses are taken over a larger set, and, hence, the featurepoints are more robust. Likewise, consider selecting the featurepoints so that . Note the featuredetection scheme as described in the feature detection methodimplicitly assumes to be in nity. If and are chosento be large enough, then the resulting feature points are very ro-bust, i.e., retained in several attacked versionsof the image. Sim-ilarly, if the two thresholds are chosen to be very low, then theresulting features tend to be easily removed by several percep-tually insigni cant modi cations. The thresholds and the size

of the search neighborhood facilitate a perceptual robustnessversus fragility tradeoff.

When the number of random partitions is one, and a de-terministic quantization rule is employed in Section IV , Algo-rithms 1 and 2 are the same. If is very large, then the randomregions shrink to an extent that they do not contain signi cant

chunks of geometrically strong components, and, hence, the re-sulting features are not robust. The parameter facilitates arandomness versus perceptual robustness tradeoff.

Recall from Section IV that the output of the quantizationscheme for binarizing the featurevector is completely determin-istic except for the interval . In general, more than onechoice of the pair may satisfy (20) . Trivial solutions to(20) are (a) and (b) . While (a)corresponds to the case when there is no randomness involved,the choice in (b) entails that the output of the quantizer is al-ways decided by a randomization rule. In general, the greaterthe value of , the more the amountof unpredictability in the output. This is a desired property tominimize collision probability. However, this also increases thechance that slight modi cations to the image result in differenthashes. A tradeoff is, hence, facilitated between perceptual ro-bustness and randomization.

D. Statistical Analysis Via ROC Curves

In this section, we present a detailed statistical comparison of our proposed feature-point scheme for hashing against methodsbased on preserving coarse image representations. In particular,we compare the performance of our intermediate hash based onthe end-stopped wavelet transform against the discrete wavelettransform (DWT) and the discrete cosine transform (DCT).

Let denote the family of perceptually insigni cant attackson an image , and let be a speci c attack. Likewise,let represent the family of content changing attacks on , andlet be a speci c content changing attack. Then, we de nethe following terms.

Probability of False Positive :

(25)

Probability of False Negative :

(26)

To simplify the presentation, we construct two representativeattacks. A strong perceptually insigni cant attack in : A com-

posite attack was constructed for this purpose. The com-plete attack (in order) is described as: 1) JPEG compres-sion with , 2) 3 rotation and rescaling to theoriginal size, 3) 10% cropping from the edges, and 4) Ad-ditiveWhite Gaussian Noise (AWGN) with (imagepixel values range in 0 255). Fig. 6(a) (e) shows the orig-inal and modi ed house images at each stage of this attack.

A content changing attack in : The content changingattack consisted of maliciously replacing (a randomly se-lected) region of the image by an alternate unrelated image.An example of this attack for the Lena image is shown inFig. 7 .


11/14


Fig. 6. Content changing attacks and feature extractor response. Feature points overlayed on the images. (a) Original toys image. (b) Tampered toys image.(c) Original Clinton image. (d) Tampered Clinton image.

For xed and , the probabilities in (25) and (26) are com-

puted by applying the aforementioned attacks to a natural imagedatabase of 1000 images and recording the failure cases. Asand are varied, and describe an ROC (receiveroperating characteristic) curve.

All images were resized to 512 512 prior to applying thehash algorithms. For the results to follow, our intermediate hashwas formed as described in Section V-A by retaining thestrongest features. The intermediate hash/feature vector in theDWT based scheme was formed by retaining the lowest resolu-tion subband in an -level DWT decomposition. In the DCTscheme, correspondingly, a certain percentage of the total DCTcoef cients were retained. These coef cients would in general

belong to a low frequency band (but not including DC, since itis too sensitive to scaling and/or contrast changes).Fig. 9 shows ROC curves for the three schemes for extracting

intermediate features of images: 1) preserving low-frequencyDCT coef cients, 2) low resolution wavelet coef cients, and 3)the proposed scheme based on end-stopped kernels. Each pointon these curves represents a pair computed as in (25)and (26) for a xed and . We used in all cases. Forthe ROC curves in Fig. 9, we varied in the range [0.1, 0.3]. Atypical application, e.g., image authentication or indexing, willchoose to operate at a point on this curve.

To ensure a fair comparison among the three schemes, weconsider two cases for each hashing method. For the DWT, weshow ROC curves when a six-level DWT and ve-level DWTtransform was applied. A six-level DWT on a 512 512 image

implies that 64 transform coef cients are retained. In a ve-level

DWT similarly, 256 coef cients are retained. Similarly, for theDCT based scheme we show two different curves in Fig. 9 , re-spectively, corresponding to 64 and 256 low-frequency DCTcoef cients. For our intermediate hash, we show curves corre-sponding to and .

In Fig. 9 , both the false positive as well as the false negativeprobabilities are much lower for our proposed hash algorithm.Predictably, as the number of coef cients in the intermediatehash are increased for either scheme, a lower false positiveprobability (i.e., less collisions of perceptually distinct images)is obtained at the expense of increasing the false negativeprobability. Recall,however, from Section VI-C that this tradeoff

can be facilitated in our intermediate hash even with a xednumber of coef cients an option that the DWT/DCT doesnot have.

In Fig. 9 , with 64 features, our hash algorithm basedon end-stopped kernels vastly outperforms the DCT as well asDWT based hashes 5 in achieving lower false positive probabil-ities, even as a much larger number of coef cients is used forthem. We also tested our hash algorithms on all possible pair-ings (499500) of the 1000 distinct images in our experiments.Only 2 collision cases (same/close intermediate hash vectors forvisually distinct images) were observed with 100, and vewith 64.

5Allthe wavelet transforms in theMATLAB wavelet toolbox version 7.0weretested. The results shown here are for the discrete Meyer wavelet dmey which

gave the best results amongst all DWT families.


12/14


Fig. 7. Representative perceptually insigni cant attack on the house image: images after each stage of the attack. (a) Original house image. (b) JPEG, Q F = 2 0 .(c) Image in (b) after 3 rotation and scaling. (d) Image in (c) cropped 10% on the sides and rescaled to original size. (e) Final attacked image: AWGN attack onthe image in (d).

VII. D ISCUSSIONS AND CONCLUSION

This paper proposes a two-stage framework for perceptually-based image hashing consisting of visually robust feature ex-traction (intermediate hash) followed by feature vector com-pression ( nal hash). A general framework for constructing in-termediate hash vectors from imagesvia visually signi cant fea-ture points is presented.

An iterative feature extraction algorithm based on preservingsigni cant image geometry is proposed. Several robust featuredetectors may be used within the iterative algorithm. Parameters

in our feature detector enable tradeoffs between robustness andfragility of the hash, which are otherwise hard to achieve withtraditional DCT/DWT based approaches.

We develop both deterministic and randomized algorithms.Randomization is particularly desirable is adversarial scenariosto lend unpredictability to the hash and reduce its vulnerabilityto attacks by a malicious adversary. ROC analysis is performedto demonstrate the statistical advantages of our hash algorithm

over existing schemes based on preserving coarse image repre-sentations.


13/14


Fig. 8. Example of the representative content changing attack on the Lena image: 15% of the image area isbeing corrupted. (a) OriginalLena image. (b) TamperedLena image.

Fig. 9. ROC curves for hash algorithms based on three approaches: DCT transform, DWT transform, proposed intermediate hash based on end-stopped kernels.Note that error probabilities are signi cantly lower for the proposed scheme. (Color version available online at http://ieeexplore.ieee.org.)

It is useful to think of features extracted via our randomizedhash algorithm as a pseudorandom signal representation schemefor images, i.e., a different representation, each suf cient tocharacterize the image content, is obtained (with high proba-bility) as the secret key is varied. Future work could explorealternate pseudorandom signal representations for image iden-tication and hashing. In particular, the goal of secure hashingcan be understood as developing the pseudorandom image rep-resentation that leaks the minimum amount of information aboutthe image.

REFERENCES

[1] A. Menezes, V. Oorschot,and S. Vanstone , Handbook of Applied Cryp-tography . Boca Raton, FL: CRC, 1998.

[2] K. Mihcak, R. Venkatesan, and T. Liu, Watermarking via optimiza-tion algorithms for quantizing randomized semi-global image statis-tics, ACM Multimedia Syst. J. , Apr. 2005.

[3] M. Schneiderand S. F.Chang, A robust content baseddigital signaturefor image authentication, in Proc. IEEE Conf. Image Processing , Sep.1996, vol. 3, pp. 227 230.

[4] C. Kailasanathan and R. S. Naini, Image authentication surviving ac-ceptable modi cations using statistical measures and k-mean segmen-tation, presented at the IEEE-EURASIP Workshop Nonlinear Signaland Image, Jun. 2001.

[5] R. Venkatesan, S. M. Koon, M. H. Jakubowski, and P. Moulin, Robustimage hashing, in Proc. IEEE Conf. Image Processing , Sep. 2000, pp.664 666.

[6] C. Y. Lin and S. F. Chang, A robust image authentication systemdistingushing JPEG compression from malicious manipulation, IEEE Trans. Circuits Syst. Video Technol. , vol. 11, no. 2, pp. 153 168, Feb.2001.

[7] C.-S. Lu and H.-Y. M. Liao, Structural digital signature for imageauthentication, IEEE Trans. Multimedia , vol. 5, no. 3, pp. 161 173,Jun. 2003.

[8] J. Fridrich and M. Goljan, Robust hash functions for digital water-marking, in Proc. IEEE Int. Conf. Information Technology: Codingand Computing , Mar. 2000, pp. 178 183.

[9] K. Mihcak and R. Venkatesan, New iterative geometric techniques forrobust image hashing, in Proc. ACM Workshop on Security and Pri-vacy in Digital Rights Management Workshop , Nov. 2001, pp. 13 21.


14/14


[10] S. S. Kozat, K. Mihcak, and R. Venkatesan, Robust perceptual imagehashing via matrixinvariances, in Proc. IEEE Conf. ImageProcessing ,Oct. 2004, pp. 3443 3446.

[11] S. Bhatacherjee and M. Kutter, Compression tolerant image authenti-cation, presented at the IEEE Conf. Image Processing, 1998.

[12] J. Dittman, A. Steinmetz, and R. Steinmetz, Content based digitalsignature for motion picture authentication and content-fragile water-marking, in Proc. IEEE Int. Conf. Multimedia Computing and Sys-

tems , 1999, pp. 209 213.[13] D. H. Hubel and T. N. Wiesel, Receptive elds and functional archi-tecture in two nonstriate visual areas of the cat, J. Neurophysiol. , pp.229 289, 1965.

[14] S. Bhatacherjee and P. Vandergheynst, End-stopped wavelets fordetection low-level features, in Proc. SPIE Wavelet Applications inSignal and Image Processing VII , 1999, pp. 732 741.

[15] M. Johnson and K. Ramachandran, Dither-based secure imagehashing using distributed coding, in Proc. IEEE Conf. Image Pro-cessing , 2003.

[16] V. Monga, A. Banerjee, and B. L. Evans, Clustering algorithms forperceptual image hashing, in Proc. IEEE Digital Signal ProcessingWorkshop. , Aug. 2004, pp. 283 287.

[17] V. Monga, A. Banerjee, and B. L. Evans, A clustering based approachto perceptual image hashing, IEEE Trans. Signal Process. , to be pub-lished.

[18] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F.

Schaffalitzky, T. Kadir,and L. Van Gool, A compasionof af ne regiondetectors, Int. J. Comput. Vis. , to be published.

[19] C. Harris and M. Stephen, A combined corner and edge detector, inProc. Alvey Visual Conf. , Sep. 1988, pp. 147 151.

[20] H. P. Moravec, Obstacle avoidanceand navigation in thereal world bya seen robot rover, Robotics Inst., Carnegie-Mellon Univ., Pittsburgh,PA, Tech. Rep. CMU-RI-TR-3, Sep. 1980.

[21] K. Mikolajczyk and C. Schmid, Scale and af ne invariant interestpoint detectors, Int. J. Comput. Vis. , pp. 6368, 2004.

[22] J. Matas, O. Schum, M. Urban, and T. Pajdla, Robust wide-baselinestereo from maximally stable extremal regions, in Proc. Brit. Mach.Vision Conf. , 2002, pp. 384 393.

[23] A. Dobbins, S. W. Zucker, and M. S. Cynader, End-stopping and cur-vature, Vis. Res. , pp. 1371 1387, 1989.

[24] J.-P. Antoine and R. Murenzi, Two-dimensional directional waveletsand the scale-angle representation, Signal Process. , pp. 259281,1996.

[25] R. Motwani and P. Raghavan , Randomized Algorithms . Cambridge,U.K.: Cambridge Univ. Press, 1996.

[26] D. E. Dudgeon and R. M. Mersereau , Multidimensional Digital SignalProcessing . Englewood Cliffs, NJ: Prentice-Hall, 1984.

[27] G. Sharma , Digital Color Imaging Handbook . Boca Raton,FL: CRC,2002.

[28] FairEvaluation Procedures for Watermarking Systems, 2000. [Online].Available: http://www.petitcolas.net/fabien/watermarking/stirmark

[29] V. Monga, D. Vats, and B. L. Evans, Image authentication under geo-metric attacks via structure matching, presented at the IEEE Int. Conf.Multimedia and Expo Jul. 2005.

VishalMonga (S97M06) received theB.Tech.de-gree in electrical engineering from the Indian Insti-tute of Technology(IIT),Guwahati, in May2001,andthe M.S.E.E. and Ph.D. degrees from The Universityof Texas, Austin, in May 2003 and August 2005, re-spectively.

Since Oct 2005, he has been a Member of Re-search Staff within the Xerox Innovation Group, ElSegundo, CA. His research interests span signal andimage processing, applied linear algebra, and infor-mation theory. He has researched problems in color

imaging, particularly halftoning and device color correction, information em-bedding in multimedia, and multimedia security.

Dr. Monga is a member of SPIE and IS&T. He received the IS&T RaymondDavis scholarship in 2004, a Texas Telecommunications Consortium (TxTec)Graduate Fellowship from The University of Texas for the year 2002 to 2003,and the President s Silver Medal in 2001 from IIT, Guwahati.

Brian L. Evans (SM 97) received the B.S.E.E.C.S.degree from the Rose-Hulman Institute of Tech-nology, Terre Haute, IN, in 1987, and the M.S.E.E.and Ph.D.E.E. degrees from the Georgia Institute of Technology, Atlanta, in 1988 and 1993, respectively.

From 1993 to 1996, he was a PostdoctoralResearcher at the University of California, Berkeley.Since 1996, he has been with the Department of Electrical and Computer Engineering at The Uni-versity of Texas, Austin, where he is currently aProfessor. In signal processing, his research group

is focused on the design and real-time software implementation of wirelessOFDM base stations and ADSL transceivers for high-speed Internet access.His group developed the rst time-domain equalizer training method that maxi-mizes a measure of bit rate and is realizable in real time in xed-point software.In image processing, his group is focused on the design and real-time softwareimplementation of high-quality halftoning for desktop printers, multimediasecurity and search, and smart image acquisition for digital still cameras.

Dr.Evansis a memberof theDesign andImplementation of SignalProcessingSystems Technical Committee of the IEEE Signal Processing Society. He is therecipient of a 1997 U.S. National Science Foundation CAREER Award.

monga paper 1

Documents