gujrati character recognition using weighted k-nn and mean χ 2 distance measure

ORIGINAL ARTICLE

Gujrati character recognition using weightedk-NN and Mean v2 distance measure

Jayashree Rajesh Prasad • Uday Kulkarni

Received: 21 August 2012 / Accepted: 23 July 2013

� Springer-Verlag Berlin Heidelberg 2013

Abstract With advances in the field of digitization,

document analysis and handwriting recognition have

emerged as key research areas. Authors present a hand-

written character recognition system for Gujrati, an Indian

language spoken by 40 million people. The proposed sys-

tem extracts four features. A unique pattern descriptor and

Gabor phase XNOR pattern are the two features that are

newly proposed for isolated handwritten character set of

Gujrati. In addition to these two features, we use contour

direction probability distribution function and autocorre-

lation features. Next contribution is the weighted k-NN

classifier. This research finally contributes is a novel mean

v2 distance measure. Proposed classifier exploits a combi-

nation of feature weights, new distance measure along with

a triangular distance and Euclidian distance for perfor-

mance that improves conventional k-NN classifier. The

implementation on a comprehensive data set show 86.33 %

recognition efficiency. Facts and figures show that pro-

posed approach outperforms conventional k-NN. It is

concluded that despite the shape ambiguities in Indian

scripts, proposed classification algorithm could be a dom-

inant technique in the field of handwritten character

recognition.

Keywords Optical character recognition (OCR) �Weighted k-NN classification � Gabor phaser XNOR

pattern

1 Introduction

Development of complete OCR systems for Indian lan-

guage scripts is challenging and this field is still in infancy.

It is well-known that statistical and structural approaches to

OCR have specific advantages and disadvantages. Authors

propose hybrid feature extraction techniques to leverage

the advantages of both these approaches. Hybrid approa-

ches overcome the problems associated with statistical and

structural methods when utilized independently.

With this preview, the proposed features are listed as

follows:

1. Pattern descriptor

2. Gabor phase XNOR pattern (GPXNP)

3. Contour direction probability distribution function

(CDPDF)

4. Autocorrelation

Structural properties of Gujrati script are represented by

a novel pattern descriptor and zone profiles whereas sta-

tistical peculiarities are depicted by unique GPXNP,

CDPDF and autocorrelation features.

1.1 Characteristics of Gujrati script

Gujrati is an Indian script similar in appearance to other

Indo-Aryan scripts. Printed Gujrati script has a rich literary

heritage [1, 2]. Gujrati has 12 vowels and 34 consonants, as

shown in Fig. 1. Gujrati belongs to the genre of languages

that use variants of the Devanagri script [3, 4]. No

J. R. Prasad (&)

Department of Computer Engineering, Vishwakarma Institute

of Information Technology, Pune, India

e-mail: [email protected]

U. Kulkarni

Department of Computer Engineering, SGGS Institute

of Engineering and Technology, Nanded, India

e-mail: [email protected]

123

Int. J. Mach. Learn. & Cyber.

DOI 10.1007/s13042-013-0187-z

significant work is found in the literature that addresses the

recognition of Gujrati language [5–7]. Research on Gujrati

OCR is still in nascent stage as compared to OCR research

in many other scripts.

The reasons behind the fact stated above reveal pecu-

liarities of Gujrati. For example, some of the Gujrati

characters are very similar in appearance. With sufficient

noise these characters can easily be misclassified. Often,

these characters are misclassified even by humans who

then need to use context knowledge to correct the error.

1.1.1 System architecture

System architecture for handwritten Gujrati OCR as shown

in Fig. 2. Training phase comprises preprocessing and

feature extraction. GPXNP embodies significant discrimi-

nating power. A novel pattern descriptor demonstrates

ability to recognize curves, holes and variety of strokes.

Zone profile represents the geometric properties of char-

acter contour. CDPDF provides writer independence by

representing peculiarities of multiple writers. Finally,

autocorrelation provides the notion of self-matching.

1.2 Data set description

The availability of data set that captures variations

encountered in real world is a critical issue in any experi-

mental research.

To the best of our knowledge, no handwritten Gujrati

data sets exist [8, 9]. Therefore 360 samples from different

writers are collected for each character in Gujrati alphabet

i.e. 34 consonants and 12 vowels. Thus this data set con-

sists of 16,560 samples altogether. The characters are

scanned at 300 dots per inch resolution.

Experiments are executed on unconstrained handwritten

characters. Authors aim to develop robust character rec-

ognition for unconstrained i.e. broken or damaged char-

acters. The data set accommodates noisy characters with

skew equally.

2 Preprocessing

As the first step, image and data preprocessing serve the

purpose of extracting regions of interest, enhancing and

cleaning up the images, so that they can be directly and

efficiently processed by the feature extraction stage [10].

Digital scanners are default image acquisition devices; they

are fast, versatile, mobile, and are relatively cheap.

In OCR applications, however, digital scanners suffer

from a number of limitations e.g. geometrical distortions

[11]. Due to absence of standard image acquisition pro-

cedures for OCR data sets, efficient preprocessing is

required [12]. Initially, the scanned images undergo nor-

malization operation.

2.1 Normalization

Character normalization is considered to be the most

important preprocessing operation for character

Fig. 1 Gujrati script with consonants and vowels

Recognition phase

Test character

preprocessing and feature extraction

Weighted-NN

algorithm

Output

InverseNormalization

Database

Training phase

Preprocessing

Gray conversion

Resizing

Normalization

Skeletonization

Thinning

Library of

feature vectors

Feature extraction

GPXNP

Pattern descriptor

Contour direction probability distribution

function

Autocorrelation

Fig. 2 Weighted k-NN

classifier with feature

extraction, training and

recognition of isolated Gujrati

characters


123

recognition. General approach to image normalization

includes mapping an image onto a standard plane of a

predefined size, so as to give a representation of fixed

dimensionality for classification. The goal for character

normalization is to reduce the within class variation of the

shapes of the characters in order to facilitate feature

extraction process and also improve their classification

accuracy. Character normalization is broadly categorized

into three types: linear normalization, moment-based nor-

malization and nonlinear normalization [13].

In order to increase both the accuracy and the inter-

pretability of the digital data during the image processing

phase, normalization is performed in preprocessing stage.

To do this, moment-based normalization, a well-known

technique in computer vision and pattern recognition

applications as proposed by [14, 15] is used.

The sole purpose of using moment-based normalization

is to obtain a normalized image from a geometric trans-

formation procedure that is invariant to any affine distor-

tion of the image. This enhances the recognition rate of

character even when the character samples from different

writers exhibit affine geometric variations [15]. Here, the

phrase, ‘affine transformation’ refers to a transformation

that is a combination of single transformations such as

translation or rotation or reflection on an axis.

Initially, original image is resized. It is then converted to

gray scale format that subsequently undergo normalization

operation. A sequence of geometric transformation opera-

tions that is invariant to any affine distortion of the image

yields normalized image. Normalized image has standard

size and orientation. Features are extracted from normal-

ized image. Normalized image is converted back to its

original size and orientation during recognition phase.

Sections 2.2 and 2.3 describe details of normalization

process.

2.2 Image moments and affine transforms

Let f(x,y) denote a digital image of size M 9 N. It’s geo-

metric moments Mpq and central moments lpq, with

p,q = 0,1,2 … are defined respectively as,

Mpq ¼XM�1

x¼0

XN�1

y¼0

xpyqf ðx; yÞ ð1Þ

and

lpq ¼XM�1

x¼0

XN�1

y¼0

ð�x� xÞpð�y� yÞqf ðx; yÞ ð2Þ

where

�x ¼ m10

m00

; �y ¼ m01

m00

ð3Þ

An image g(x,y) is said to be an affine transform of f(x,y)

if there is a matrix A ¼ a11 a12

a21 a22

� �and vector d ¼

d1

d2

� �such that g(x,y) = f(xa,ya) where,

xa

ya

� �¼ A:

x

y

� �� d: ð4Þ

Affine transformation includes shearing in x direction

denoted as Ax ¼1 b0 1

� �; shearing in the y direction

which is denoted by Ay ¼1 0

c 1

� �; and scaling in both x

and y directions which corresponds to As ¼a 0

0 d

� �:

Moreover, it is straightforward to show that any affine

transform A can be decomposed as a composition of the

aforementioned three transforms e.g., A ¼ As � Ay � Ax;,

provided that a11 6¼ 0 and det ðAÞ 6¼ 0.

2.3 Applying image normalization

The normalization procedure consists of the following

steps for a given image f(x,y)

1. Center the image f(x,y); this is achieved by setting in

(4) the matrix A ¼ 1 0

0 1

� �and the vector d ¼

d1

d2

� �with

d1 ¼m10

m00

; d2 ¼m01

m00

;

where m10;m01; and m00 are the moments of f(x,y) as

used in (3).

This step aims to achieve translation invariance. Let

f1(x,y) denote the resulting centered image as shown in

Fig. 3a.

2. Apply a shearing transform to f1(x,y) in the x direction

with matrix Ax ¼1 b0 1

� �so that the resulting image

as shown in Fig. 3b, denoted by f2ðx; yÞ ¼ Ax½f1ðx; yÞ�;is achieved.

3. Apply a shearing transform to f1(x,y) in the y direction

with matrix Ay ¼1 0

c 1

� �so that the resulting image

as shown in Fig. 3c, denoted by f3ðx; yÞ ¼ Ay½f2ðx; yÞ�;is achieved.

4. Scale f3(x,y) in both x and y directions with As ¼

a 0

0 d

� �so that the resulting image as shown in

Fig. 3d, denoted by f4ðx; yÞ ¼ As½f3ðx; yÞ�; is achieved.


123

To review the significance of normalization process,

authors highlight a fact. The handwriting samples from 360

writers reflect variety in writing styles, stroke directions

and character shapes violation.

Technically, if these style variations are viewed as a

general affine transformation, i.e. shearing in both x and

y directions, and scaling in both x and y directions, then

four steps in the normalization procedure are designed to

eliminate effects of each of these distortion or variation

components.

Step (1) specified, eliminates the translation of the

character image by adjusting the center of the image; steps

(2) and (3) eliminate shearing in the x and y directions; step

(4) eliminates scaling distortion by forcing the normalized

image to a standard size. Figure 3 show results of sub-

sequent operations on few characters such as k, Ka, ca, j

and T during image normalization. The final image

(e) shown all these figures is the normalized image, based

on which subsequent feature extraction is performed. It is

important to note that each step in the normalization pro-

cedure is readily invertible. This helps to convert the nor-

malized image back to its original size and orientation.

2.4 Skeletonization

The aim of the skeletonization is to extract a region-based

shape feature representing the general form of an object. It

is a common preprocessing operation in pattern

recognition.

The major skeletonization techniques are:

1. detecting ridges in distance map of the boundary

points,

2. calculating the Voronoi diagram generated by the

boundary points, and

3. the layer by layer erosion called thinning.

In digital spaces, only an approximation to the ‘‘true

skeleton’’ can be extracted. There are two requirements to

be complied with:

1. topological i.e. to retain the topology of the original

object,

2. geometrical i.e. forcing the ‘‘skeleton’’ being in the

middle of the object and invariance under the most

important geometrical transformation including trans-

lation, rotation, and scaling.

Skeletonization removes pixels on the boundaries of

objects but does not allow objects to break apart. The

pixels remaining make up the image skeleton. This oper-

ation preserves the Euler number. Authors use a combined

skeletonization and thinning approach using a image pro-

cessing toolbox in Matlab 2011a.

2.5 Thinning

This morphological operation is used to remove selected

foreground pixels from binary images, somewhat like

erosion or opening. It is particularly used along with

skeletonization. These two operations are performed for the

purpose of extraction of pattern descriptor illustrated in

Sect. 3.2.

Authors use following algorithm [16] for thinning:

1. Divide the image into two distinct subfields in a

checkerboard pattern.

2. In the first sub-iteration, delete pixel p from the first

subfield if and only if the conditions G1, G2 and G3 are

all satisfied.

3. In the second sub-iteration, delete pixel p from the

second subfield if and only if the conditions G1, G2 and

G3 ‘are all satisfied.

Condition G1: XHðpÞ ¼ 1; where

XHðpÞ ¼X4

i¼1

bi

bi ¼1; if x2i�1 ¼ 0 and ðx2i ¼ 1 or x2iþ1 ¼ 1Þ0; otherwise

�

x1; x2;. . .; x8 are the values of the eight neighbors of p,

starting with the east neighbor and numbered in counter-

clockwise order.

Condition G2: 2�minfn1ðpÞ; n2ðpÞg� 3 where

Fig. 3 a Original image of character ‘ka’; b original image in a after translation; c image after X shearing; d image after Y shearing; e image

after scaling; f normalized image


123

n1ðpÞ ¼X4

k¼1

x2k�1 _ x2k

n2ðpÞ ¼X4

k¼1

x2k _ x2kþ1

Condition G3: ðx2 _ x3 _ �x8Þ ^ x1

Condition G03 : ðx6 _ x7 _ �x4Þ ^ x5 ¼ 0

The two sub-iterations G3 and G03 together make up one

iteration of the thinning algorithm. When the user specifies

an infinite number of iterations, the iterations are repeated

until the image stops changing. The conditions are all

tested using applylut function with pre-computed lookup

tables in Matlab.

3 Proposed feature extraction methods

3.1 Gabor phase XNOR pattern (GPXNP)

Wavelets features have been used for Malayalam OCR

[17]. To explore the multi-resolution property of Gabor

wavelet, authors extract this feature. This feature is

inspired by Local Gabor XOR pattern (LGXP) that is

proposed for accurate face recognition under uncontrolled

circumstances [18]. LGXP as illustrated in [18] exhibits a

problem as illustrated by Fig. 4. This system overcomes

this problem by modifying LGXP as GPXNP to achieve

better representation as shown in Fig. 5.

3.1.1 Gabor wavelet representation

Initially image is represented as Gabor wavelet. This is

obtained by convolution of the image with Gabor kernel,

i.e.

Gl;vðzÞ ¼ IðzÞ � wl;vðzÞ: ð5Þ

Here Ið�Þ denotes the input image, and * denotes the

convolution operator; z denotes the pixel, i.e., z = (x,y) and

wl;vð�Þ denotes the Gabor kernel with orientation l and

scale v, which is defined as:

wl;vðzÞ ¼kl;v

�� 2

r2e � kl;vk k2

zk k2=2r2� �

eikl;vz�e�r2=2�

ð6Þ

where �k k denotes the norm operator, and the wave vector

Kl;v is defined as follows:

kl;v ¼ kvei/l ð7Þ

where Kv ¼ kmax=f v and /l ¼ pl=8; kmax is the maximum

frequency is and f is the spacing between kernels in the

frequency domain.

Furthermore Gabor phase and magnitude are computed.

For each Gabor kernel, at every image pixel z, a complex

number containing two Gabor parts, i.e. real part Rel;vðzÞand imaginary part Iml;vðzÞ, is generated. Based on these

two parts, magnitude Al;v zð Þ and phase Ul;vðzÞ are com-

puted by Eqs. (8) and (9) respectively.

Al;vðzÞ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiIm2

l;vðzÞ þ Re2l;vðzÞ

qð8Þ

Ul;vðzÞ ¼ arctanðIml;vðzÞ=Rel;vðzÞÞ ð9Þ

These Gabor phase values as obtained from Eq. (9) are

represented as a matrix represented in 3 9 3 neighborhood

in Fig. 4. According to LGXP, for two different phase

matrices authors obtain the same decimal equivalent. This

leads to ambiguous representation and thereby may

mislead the classifier. Authors therefore propose a

modified GPXNP that computes unique decimal

equivalents for the same phase matrices that have been

used to illustrate LGXP limitation.

Both the examples thus demonstrate that, same decimal

equivalent representation is yielded for different Gabor

phase values. Computation of GPXNP addresses the

modification for this limitation of LGXP. Figure 5 illus-

trates steps in computation of GPXNP.

3.1.2 Compute GPXNP in binary and decimal form

Table 1 shows phase ranges and quantized values. Figure 4

shows Gabor phases, represented in 3 9 3 neighborhood,

followed by quantized values obtained as shown in

Table 1. Next, transpose of the quantized matrix is com-

puted. Finally center pixel zc is XNORed with all the eight

neighborhood pixels. The resultant matrix shows GPXNP

represented in 3 9 3 neighborhood

Formally, GPXNP in binary and decimal form is defined

as follows:

Binary Number11101111

Decimal equivalent239

Gabor phase matrix Quantized phases XOR each pixelwith center pixel

(a) Example 1

Binary Number11101111

Decimal equivalent239

Gabor phase matrix Quantized phases XOR each pixelwith center pixel

(b) Example 2

92 170 262 1 1 2 1 1 1

140 45 85 1 0 0 1 0

131 282 149 1 3 1 1 1 1

256 264 282 2 2 3 1 1 1

199 175 124 2 1 1 1 0

57 42 266 0 0 2 1 1 1

Fig. 4 LGXP encoding a example 1 and b example 2; yielding same

decimal equivalent for two different Gabor phase matrices


123

GPXNPl;vðzcÞ

¼ GPXNPPl;v;GPXNPP�1

l;v ; . . .;GPXNP1l;v

h i

binary

¼XP

i¼1

2i�1:GPXNPil;v

" #

decimal

ð10Þ

where zc denotes the central pixel position in the Gabor

phase map with scale v and orientation l, P is the size of

neighborhood. For i ¼ 1; 2; . . .;P XOR pattern for zc and

its neighbor zi is computed as follows:

GPXPil;v ¼ qðUl;vðziÞÞ � q0ðUl;vðziÞÞ ð11aÞ

GPXNPil;v ¼ qðUl;vðzcÞÞXNORq0ðUl;vðziÞÞ ð11bÞ

where i ¼ 1; 2; . . .;P and Ul;v �ð Þ denotes the phase, qð�Þdenotes the quantization operator, which calculates the

quantized code of the phase according to the number of

ranges, as defined in (12). q0 denotes the transpose of

quantized matrix.

qðUl;vð�ÞÞ ¼ i;

for i ¼ 0; 1; . . .:; b� 1

if360 � i

b�Ul;vð�Þ\

360 � ðiþ 1Þb

ð12Þ

where b denotes the number of phase ranges. After this,

GPXNP descriptor of the input character image is computed.

3.1.3 Compute GPXNP descriptor

With the pattern defined above, one pattern map is calcu-

lated for each Gabor kernel. Then, each pattern map is

divided into m non-overlapping sub-blocks, and the ori-

entations are concentrated to form the proposed GPXNP

descriptor of the input character image:

H ¼½Hl0;v0;1; . . .;Hl0;v0;m; . . .;

Hl0�1;vs�1;1; . . .;Hl0�1;vs�1;m�ð13Þ

where Hl,v,i for i = 1,2, …, m denotes the histogram of the

ith sub-block of GPXNP map with scale v and orientation

l. Steps to compute GPXNP are listed below and are

described by Fig. 6.

3.2 Pattern descriptor

This system performs pattern-based image comparison by

measuring the similarity between patterns represented by

their features. Simple geometric features are used to

describe patterns. Geometric features efficiently discrimi-

nate patterns with large differences; therefore, they are

found useful to eliminate false hits. To extract the pattern

descriptor, this system first performs skeletonization and

thinning operations. Then image is converted into a vector

and pattern descriptor is formed.

3.2.1 Vector conversion

Furthermore, the morphologically processed binary image of

size 128 9 128 is converted into a one dimensional vector as

Binary number 11001001

Decimal Equivalent 201

Gabor phase matrix Quantized ’ Bitwise XOR XNOR each pixel phases and with center pixel

(a) Example 1, revisited for GPXNP computation

Binary number 11001001

Decimal Equivalent 201

Gabor phase matrix Quantized ’ Bitwise XOR XNOR each pixel phases and with center pixel

(b) Example 1, revisited for GPXNP computation

92 170 262 1 1 2 1 1 1 0 0 1 1 1 0

140 45 85 1 0 0 1 0 3 0 0 1 1 0

131 282 149 1 3 1 2 0 1 1 1 0 0 0 1

256 264 282 1 1 2 1 1 2 0 0 0 1 1 1

199 175 124 1 0 0 1 0 3 0 0 1 1 0

57 42 266 2 3 1 2 0 1 0 1 0 1 0 1

Fig. 5 GPXNP encoding

a example 1 and b example 2;

yielding unique decimal

equivalents for two different

Gabor phase matrices

Table 1 Phase ranges and quantized values

Phase range

in degrees

Quantized phase

value

0–89 0

90–179 1

180–269 2

270–359 3


123

shown in Fig. 7c. This one dimensional vector is formed by

column-wise representation of pixel values i.e. 16,384 pixel

values in all. Similarly one dimensional vector is formed by

row-wise representation of pixels also. Every character has a

unique pattern of sequential placement of pixels. Sequential

analysis of this one-dimensional vector leads to formation of

pattern descriptor vector as shown in Fig. 7d. This unique

pattern descriptor generates five values for column vector and

row vector each for representing the five different bit patterns

as shown in Fig. 7d. For each character image two pattern

descriptors are yielded, i.e. one each for row and column.

This can be examined from Table 2. Figure 8 describes the

process of extraction of pattern descriptor.

3.2.2 Pattern descriptor formation

This sub-section describes formation of a pattern descriptor

which defined by a set of numbers that are produced to

represent a given character pattern descriptor. Here, pattern

descriptor is extracted as a set of five patterns for column-

wise and row-wise representation of a one dimensional

vector. These five patterns are defined as 010, 0110, 01110,

011110 and 0111110 as shown in Fig. 7d. This feature

vector quantifies the character shape.

As shown in Fig. 6, a pattern descriptor as a set of 5

integer values is obtained. Thus each character is repre-

sented by a row and column vector. This is shown in

Fig. 7c. This process of vector formation leads to extrac-

tion of pattern descriptor for row and column respectively.

The extracted pattern descriptors show that, occurrences

of pattern ‘010’ is quiet common in all characters. The

count of patterns ‘0110’, ‘01110’, ‘011110 and ‘0111110’

can be seen decreasing. The reason behind this is unique-

ness of these binary sequences, which is very rare.

3.3 Contour direction probability distribution function

(CDPDF)

CDPDF has been proven as a new and very effective sta-

tistical feature for automatic writer identification for offline

handwriting [14]. This feature represents a texture

Algorithm 1: Extraction of Gabor Phase XNOR Pattern (GPXNP) (F1)Input: I- A preprocessed input image.

Output: Gabor Phase XNOR Pattern

GPXNP descriptor of the input character image

Step 1: Gabor wavelet representationa. Define Gabor kernel .b. Define wave vector .c. Compute Gabor phase and

magnitude . Step 2: Create GPXNP of the image

a. Compute GPXNP in binary and decimal form

b. Compute the neighborhood patternc. Represent GPXNP descriptor us-

ing histogram of the image.

Fig. 6 Algorithm for extraction of GPXNP

(a)

(b)

(c)

Total count of ‘0111110’Total count of ‘011110’

Total count of ‘01110’ Total count of ‘0110’

Total count of ‘010’(d)

0 0 0 0

0 0

0

0 0

0 0

0 0

0 0 0

0 0

0 0

0 0 0 0

1 1 1 1 1 0 0 1 1 . . . .

Fig. 7 a Handwritten image of ‘ka’; b binary image of ‘size

128 9 128; c one dimensional vector of size 16,384; d pattern

descriptor vector for row and column

Table 2 Analyzing the performance of ‘‘k’’ value

k value Training to testing ratio

9:1 8:2 7:3

2 78.34 86.33 80.34

3 76.36 80.34 79.00

4 73.00 74.34 75.04

5 71.67 72.00 73.86

6 73.34 74.34 71.67

7 69.67 71.67 74.00

8 67.00 69.67 72.67

9 65.00 68.00 68.67

10 64.00 61.67 64.67


123

described by probability distribution computed from the

image. It is useful for analyzing the trends in writing a

particular character in terms of slant, curvature and

roundness etc. Algorithm described in Fig. 9 describes

steps to compute this feature.

3.3.1 Orientation estimation

The orientation estimation is a fundamental step in contour

direction feature extraction. The steps for calculating the

orientation at pixel (i,j) are as follows:

1. A block of size W 9 W is centered at pixel (i,j) in the

input image.

2. For each pixel in the block, compute the gradient

oxði; jÞ and oyði; jÞ which are the gradient magnitudes

in x and y directions, respectively.

3. The local orientation at pixel (i, j) can then be

estimated using the following equations:

Vxði; jÞ ¼XiþW

2

u¼i�W2

XjþW2

v¼j�W2

2oxðu; vÞoyðu; vÞ ð14Þ

Vyði; jÞ ¼XiþW

2

u¼i�W2

XjþW2

v¼j�W2

2o2xðu; vÞo

2yðu; vÞ ð15Þ

hði; jÞ ¼ 1

2tan�1 Vyði; jÞ

Vxði; jÞð16Þ

where h(i, j) is the least square estimate of the local orien-

tation at the block centered at pixel (i, j). Figure 10a shows

orientation of contour for Gujrati character ‘ka’ i.e. k.

4. Smooth the orientation field in a local neighborhood

using a Gaussian filter. The orientation image is

converted into a continuous vector field, which is

defined as:

Uxði; jÞ ¼ cosð2hði; jÞÞ ð17ÞUyði; jÞ ¼ sinð2hði; jÞÞ ð18Þ

where Ux and Uy are the x and y components of the vector

field, respectively. After the vector field has been

computed, Gaussian smoothing is then performed as

follows:

U0

x ¼X

xU2

u¼�xU2

XxU

2

v¼�xU2

Gðu; vÞUxði� ux; j� vxÞ ð19Þ

Fig. 8 Algorithm for extraction of pattern descriptor

Algorithm 3: Extraction of Contour Direction Probability Distribution Func-tion (F3)

Input: • A preprocessed input image. • Sigma of the derivative of Gaussian

used to compute image gradients. • Sigma of the Gaussian weighting used

to sum • The gradient moments. • Sigma of the Gaussian used to smooth

the final orientation vector field.Output:

• The orientation image in radians. • Measure of the reliability of the orienta-

tion measure. Step 1: Calculate image gradients. Step 2: Estimate the local ridge orientation Step 3: Smooth the covariance data to perform a

weighted summation of the data. Step 4: Analytic solution of principal direction

• Sine and cosine of doubled angles • Smoothed sine and cosine of doubled

angles Step 5: Calculate 'reliability' of orientation data.

Fig. 9 Algorithm for extraction of CDPDF


123

U0y ¼XxU

2

u¼�xU2

XxU

2

v¼�xU2

Gðu; vÞUyði� ux; j� vxÞ ð20Þ

where G is a Gaussian low pass-filter of size xU � xU.

5. The final smoothed orientation field O at pixel (i,j) is

defined as:

Oði; jÞ ¼ 1

2tan�1 U0yði; jÞ

U0xði; jÞð21Þ

3.3.2 Angle histogram computation

Next, an angle histogram is computed. It is represented by

a polar plot showing the distribution of values grouped

according to their numeric range.

Figure 10b shows the polar plot or ridge orient graph for

character ‘ka’ i.e. k in Gujrati, considered in this illustra-

tion. Here, h is distributed in various bins that are 30 apart.

The vector theta, expressed in radians, determines the angle

of each bin from the origin. The length of each bin reflects

the number of elements in theta that fall within a group,

which ranges from zero to the greatest number of elements

deposited in any one bin.

Figure 10a The orientation of contour in character ‘ka’;

(b) Polar plot showing angle histogram for character ‘ka’.

3.4 Autocorrelation

In most of the sophisticated approaches to character rec-

ognition, a basic inability is to recognize displaced or mis-

oriented characters.

Generally speaking, these approaches need to regard

each different position of a character as a different char-

acter. This intolerance of positional changes needs a spe-

cial approach to general character recognition task.

3.4.1 The notion of self-matching

Authors exploit the notion of self-matching or autocorre-

lation. An idealized character image can be made self

detecting by comparison with a displaced copy of the ori-

ginal character. Changes in the orientation of the original

character displace the function along the x axis without

altering its shape. Figure 11 describes steps to compute

autocorrelation feature.

Assume N pairs of observations on two variables x and

y. The correlation coefficient between x and y is given by

U ¼Pðxi � �xÞðyi � �yÞ

Pðxi � xÞ�2

h i1=2 Pðyi � yÞ�2

h i1=2ð22Þ

where the summations are over the N observations.

4 Derivation of new distance measure Mean v2

Based on these parameters for deriving new measures,

authors propose a new distance measure Mean v2 is given

by Eq. (23)

It is alternatively represented as

Mean v2 ¼ ðPi � QiÞ2

Pi

þ ðPi � QiÞ2

Qi

Þ ð23Þ

where C or P is as follows:

P ¼ ðpi; p2; . . .; pnÞjpi; [ 0;� �

;

Xn

i¼1

pi¼1 n 2; ð24Þ

be the set of all complete finite discrete probability

distributions.

4.1 Mathematical hypothesis

In mathematics, a distance measure, distance metric or

distance function is a function which defines a distance

between elements of a set. A set with a metric is called a

metric space. A metric induces a topology on a set but not

all topologies can be generated by a metric. A topological

(a) (b)

Fig. 10 a The orientation of contour in character ‘ka’; b polar plot

showing angle histogram for character ‘ka’

Algorithm 5: Extraction of Autocorrelation fea-ture (F5)Input: A preprocessed input image.Output: Autocorrelation data for the image. Step 1: Read the data specified in the input file.Step 2: Represent the input image in the form of a row vector.Step 3: Create a matrix containing gray scale color map.Step 4: Compute autocorrelation of the input data

Fig. 11 Algorithm for extraction of autocorrelation


123

space whose topology can be described by a metric is

called metrizable.

In differential geometry, the word ‘‘metric’’ may refer to

a bilinear form that may be defined from the tangent vec-

tors of a differentiable manifold onto a scalar, allowing

distances along curves to be determined through integra-

tion. It is more properly termed a metric tensor.

This section presents mathematical hypothesis to justify

the essential properties of a distance measure. A metric on

a set X is a function, called the distance function or simply

distance.

d : X � X ! R

where R is the set of real numbers. For all x,y,z in X, this

function is required to satisfy the following conditions:

1. d(x, y) C 0: Property of non-negativity

2. d(x,y) = d(y,x): Property of symmetry.

3. d(x,z) B d(x,y) ? d(y,z): Property of subadditivity or

triangle inequality.

It is desired that a distance measure used must obey the

third property i.e. triangle inequality. However many

researchers in pattern recognition domain show that in non-

metric spaces, boundary points are less significant for

capturing the structure of a class than they are in Euclidean

spaces [19]. Researchers suggest parametric techniques to

supervised learning problems that involve a specific non-

metric distance functions, showing in particular how to

generalize the idea of linear discriminant functions in a

way that may be more useful in non-metric spaces.

There are many researchers who emphasize broader

views that ignore metric constraints. It is not possible to

mention all opinions but few researchers [20] show that

visual recognition is broader than just pair matching,

especially when there is multi-class training data and large

sets of features in a learning context. Researchers recon-

sider the assumption of recognition as a pair matching test,

and introduce a new formal definition that captures the

broader context of the problem. A meta-analysis and an

experimental assessment of the top algorithms show that

metric properties are often violated by good recognition

algorithms. By studying these violations, useful insights

come to light: authors make the case that locally metric

algorithms should leverage outside information to solve the

general recognition problem.

4.1.1 Proof of non-negativity

For all P;Q 2 Cn. In mathematics, in particular geometry, a

distance function on a given set M is a function d :M �M ! R; where R denotes the set of real numbers that

satisfies the following:

Condition 1: d(x,y) C 0, and d(x,y) = 0 if and only if

x,y i.e. Distance is positive between two different points,

and is zero precisely from a point to itself.

Proof: Distance is positive between two different points.

Let us consider

Mean v2 ¼ ðx� 1Þ2

xþ ðx� 1Þ2; x 2 ð0;1Þ

From (23)

Moreover

Mean0 v2 ¼ 2x3 � x2 � 1

x2ð25Þ

and

Mean00 v2 ¼ 2x3 þ 2

x3: ð26Þ

Thus it is proved that Mean v2 [ 0, for all x [ 0 and

hence, Mean v2 is strictly claimed to be convex for all

x [ 0 Furthermore, Mean v2 = 0 for x = 1 This proves

that Mean v2 is nonnegative and convex in the pair of

probability distribution ðP;QÞ 2 Cn X Cn [21].

4.1.2 Proof of symmetry

Condition 2: Distance is symmetric : d(x,y) = d(y,x) i.e.

the distance between x and y is the same in either direction.

Proof: To prove the symmetric property of the proposed

measure, authors consider four measures that are well

known in literature on information theory and statistics.

Bhattacharya distance [22] is given by

BðP Qk Þ ¼Xn

i¼1

ffiffiffiffiffiffiffiffipiqip

: ð27Þ

Hellinger distance [22] is given as

h P Qkð Þ ¼ 1� B P Qkð Þ ¼ 1

2

Xn

i¼1

ffiffiffiffipip � ffiffiffiffi

qip� �2

: ð28Þ

v2 distance [22] is given by

v2 P Qkð Þ ¼Xn

i¼1

ðpi � qiÞ2

qi

ð29Þ

and relative information [21] is

K P Qkð Þ ¼Xn

i¼1

pilnpi

qi

� �: ð30Þ

We observe that the measures (29) and (30) are not

symmetric with respect to probability distributions. The

symmetric version [22] of the measure in (31) is given by


123

J P Qkð Þ ¼ K P Qkð Þ þ K Q Pkð Þ: ð31Þ

On the similar lines, symmetric v2 distance measure can

be obtained from the following equation [21].

v2 P Qkð Þ þ v2 Q Pkð Þ ¼ pi � qið Þ2 pi þ qið Þpiqi

¼ pi pi � qið Þ2þqi pi � qið Þ2

piqi

¼ pi � qið Þ2

pi

þ pi � qið Þ2

qi

¼ � qi � pið Þ½ �2

pi

þ pi � qið Þ2

qi

ð32Þ

This is the proposed distance measure expressed by

Eq. (23).

4.1.3 Proof of subadditivity/triangle inequality

Consider points P(0,0) Q b; 0ð Þ and R(0,a) as points in the

feature space with their co-ordinates as shown in fig. 12.

The objective is prove that PQþ QRPR:

Assume PR ¼ a;PQ ¼ b

Therefore if QR is represented as Mean v2

QR ¼ðQi � RiÞ2

Qi

þ ðQi � RiÞ2

Ri

¼ ½ðb� 0Þ � ð0� aÞ�2

ðb� 0Þ þ ½ðb� 0Þ � ð0� aÞ�2

ð0� aÞ

¼ ½b� ð�aÞ�2

b� ½b� ð�aÞ�2

a

Therefore

QR ¼ðbþ aÞ2

b� ðbþ aÞ2

a

PQþ QR ¼bþ ðbþ aÞ2

b� ðbþ aÞ2

a

PQþ QR ¼ ab2 þ aðbþ aÞ2 � bðaþ bÞ2

ab

¼ ab2 þ aða2 þ b2 þ 2abÞ � bða2 þ b2 þ 2abÞab

¼ ab2 þ a3 þ ab2 þ 2a2b� a2b� b3 � 2ab2

ab

Cancelling terms,

PQþ QR ¼ a3 þ a2b� b3

ab

Now if a = b,

PQþ QR ¼ a3 þ a3 � a3

a2¼ a ¼ PR ð33Þ

and if a [ b

PQþ QR ¼ a3 þ a2b� b3

ab[ a ð34Þ

From (33) and (34) it is proved that:

PQþ QRPR

This proves that Mean v2 obeys triangle inequality.

5 Recognition

This section presents the proposed classifier algorithm.

Four features are extracted from the test character image

and classification is performed using the proposed modified

k-NN algorithm. This algorithm uses a novel Mean v2

distance measure that overcomes the limitations in con-

ventional k-NN algorithm due to Euclidian distance.

Therefore, modification to the conventional k-NN is pro-

posed in terms of weights and a new distance measure.

k-NN classification finds a group of k objects in the

training set that are closest to the test object, and bases the

assignment of a label on the predominance of a particular

class in this neighborhood.

For a test image T, the distances to all the other images

in the database are computed. Then all the images in the

database are ordered in a sorted list with increasing dis-

tance to the query image T. A large number of distance

measures were studied [22]. The choice of distance/simi-

larity measures depends on the measurement type or rep-

resentation of objects.

All the five features discussed in this system are repre-

sented differently. GPXNP and CDPDF are represented by

histograms. Therefore Euclidian distance is not a suitable

choice to compare these histograms. The new Mean v2

distance measure for matching GPXNP and CDPDF fea-

tures is given as:

Mean v2 ¼XN

n¼1

ðTn � DnÞ2

Tn

þ ðTn � DnÞ2

Dn

ð35Þ

Fig. 12 Proof of triangle inequality


123

where Tn and Dn denote test images and database images, n

is the bin index and N is the number of histogram bins. The

new Mean v2 distance measure outperforms the

conventional k-NN in terms of improved recognition

efficiency. Pattern descriptor and zone profiles are

matched with Euclidean distance denoted here by E_D is

given as:

E D ¼Xm

i�1

Xk

j�1

ðTði;jÞ � Dði;jÞÞ2 ð36Þ

Additionally, autocorrelation is computed by Eq. (1).

Autocorrelation distance denoted by A_D is a triangular

distance measure represented as:

A D ¼XN

n¼1

ðTn � DnÞ2

Tn þ Dn

ð37Þ

Total distance D computed as:

D ¼w1 �Meanv2 þ w2 � ED þ w3 � ED

þ w4 �Meanv2 þ w5 � AD

ð38Þ

where w1;w2;w3;w4 and w5 denote feature weights for

GPXNP, pattern descriptor, zone profile, CDPDF and

autocorrelation features respectively.

Here feature weights are reciprocals of the distances as

their multipliers. E_D and A_D denote the Euclidian dis-

tance and autocorrelation distance respectively.

After feature matching, the top ‘k’ characters are chosen

individually and finally, majority voting is used to choose

the exact character.

6 Results and discussion

6.1 Experimental set up

Experiments are performed on data set as mentioned in

Sect. 4. With predefined k, the number of neighbors is

selected. Performance of the system is evaluated by vary-

ing k values from 2 to 10.

It is found that recognition efficiency is maximum for

small values of k, such as 2 and 3. Authors propose K-fold

cross validation technique for validating the k value that

yields maximum recognition efficiency.

6.2 Performance analysis

6.2.1 Analyzing the performance of ‘k’ value in weighted

k-NN

Table 2 shows recognition results in percent the data set with

varying percentage of training and testing samples and k values.

The recognizer performs well for lower values of k. The

best value of k is selected as 2 from these results as it yields

the maximum recognition efficiency of 86.33 with k = 2

and ratio of training to test samples as 8:2.

6.2.2 Analyzing the performance of every individual

character

Once the best value of k is selected as 2, the performance of

individual characters including consonants and vowels is

Table 3 Recognition efficiency

with different ratio of training

versus testing samples for k = 2


123

analyzed for different training to test samples ratio as stated

above. Table 3 shows the results of this experimentation.

6.2.3 Analyzing the performance using K-cross fold

validation

Cross-validation is done by partitioning a sample of data

into complementary subsets, performing the analysis on

one subset as training set, and validating other subset as

test set.

The advantage of this method over repeated random

sub-sampling is that all observations are used for both

training and validation, and each observation is used for

validation exactly once.

Cross validation process for the proposed system is

repeated K times, with K = 10. Here each of the sample

partition used exactly once as the validation data.

The results of tenfold cross validation i.e. K = 10 and

k = 2 yielded almost similar results on the experiments.

7 Conclusion and future work

A weighted k-NN algorithm, based on language indepen-

dent feature extraction techniques towards Gujrati OCR is

presented. Enhancements in k-NN algorithm is proposed by

some researchers in different forms. To mention a few,

hubness-based fuzzy measures for high-dimensional

k-Nearest neighbor classification is explored [23]. Another

variation of k-NN with distance weights is presented [24].

As its two variants, Bayesian-KNN (BKNN) and Citation-

KNN (CKNN) are widely used for solving multi-instance

classification problems. Furthermore the non-asymptotic

behavior of the k-NN is exploited in classification process

by [25].

Authors present a novel pattern descriptor that demon-

strates ability of the proposed technique to recognize

curves, holes and variety of strokes in Gujrati script. In

addition to this structural pattern descriptor the statistical

features as GPXNP, CDPDF and autocorrelation are used.

Authors exploit Gabor phase information effectively by

using GPXNP under the framework of local pattern

encoding.

This hybrid feature framework is used as input to the

weighted k-NN algorithm, that employs efficient distance

measure to handle the choice of distance problem occurred

in conventional k-NN. To measure distance for three dif-

ferent features, authors deploy three different distance

formulas.

Feature weights are used to emphasize a feature that

displays lower distance value, as feature weights are

reciprocals of distances. Where as in conventional k-NN,

the only distance formula used for all features is either

Euclidian distance or Mahalanobis distance.

Experiments reveal that proposed approach yields rea-

sonably high recognition efficiency as compared to con-

ventional k-NN algorithm.

This system can be easily extended for the recognition

of other language scripts such as Devanagri. However

experimental results also show that few characters, because

of the similarity of shapes and large shape variations are

slightly confused by the recognizer.

A majority of errors in this category are in the recog-

nition of the visually similar characters and the characters

with matra i.e. the vertical line [26–29].

This work is probably the one among the initial attempts

towards recognition of full character set of isolated Gujrati

characters. Authors therefore faced the toughest challenge

of getting a good set of handwritten characters during

implementation of this research.

Robust experimentation on the handwritten character

samples of Gujrati was performed. The following con-

cluding points represent critical issues for future research:

1. There is an immense need of standardized data sets

using well-defined sets of characters for Indic scripts,

in order to allow meaningful comparison of different

published approaches.

2. Techniques for post processing of the classifier output

need be evolved to improve recognition of visually

similar characters.

3. For better recognition rates, appropriate combination

of feature extraction methods with different classifiers

is needed.

In addition to conventional k-NN and NN as shown in

Table 3, Authors implemented three neuro fuzzy classifiers

based on [30–33]. The researchers proposed neuro fuzzy

classifiers for different pattern recognition problems.

These existing techniques include Adaptive Neuro

Fuzzy Clasifier, Evolving Fuzzy Neuro Classifier using

Fuzzy Hedges and Evolving Fuzzy Neuro Classifier using

feature selection. Research shows that these classifiers [30–

33] work well for various pattern recognition problems.

Table 4 Performance evaluation with existing techniques

No. Classification method Recognition g in

percent

1 Conventional k-NN 16.09

2 Neural network 24.38

3 Adaptive neuro fuzzy classifier 46.67

4 Evolving neuro fuzzy classifier using

fuzzy hedges

52.00

5 Evolving neuro fuzzy classifier using

feature selection

68.00

6 Weighted k-NN 86.33


123

The combined results of comparison of the proposed

system with these three in addition to conventional k-NN

and NN is presented by Fig. 12. This comparative study

shows that proposed weighted k-NN with Mean v2 distance

outperforms all other classifiers in terms of recognition

efficiency (Table 4).

References

1. Yagnik A, Mohan SR (2006) Identification of Gujrati characters

using wavelets and neural networks. In: Proceedings of artificial

intelligence and soft computing, pp 150–155

2. Kokku A, Srinivasa Chakravarthy V (2009) A complete OCR

system development for Tamil Magazine Documents. In: OCR

for Indic scripts. Advances in Pattern Recognition. Springer,

Berlin, pp 147–162

3. Antani S, Agnihotri L (1999) Gujrati character recognition. In:

ICDAR, pp 418–421

4. Shah SK, Sharma A (2006) Design and implementation of optical

character recognition system to recognize Gujarati script using

template matching. In: IE(I) J ET 86:44–49

5. Desai A (2010) Gujarati handwritten numeral optical character

reorganization through neural network. In: Pattern recognition,

vol 43, issue 7. Elsevier Science Inc. New York, pp 2582–2589

6. Dholakia J, Negi J, Rama Mohan S (2005) Zone identification in

the printed Gujarati text. In: ICDAR, pp 272–276

7. Maloo M, Kale KV (2011) Gujarati script recognition: a review.

Int J Comput Sci Eng (IJCSE) 8:480–489

8. Chaudhuri BB, Bera S (2010) Line word and character segmen-

tation from handwritten Bangla text documents. In: Proceedings

of International conference on advances in computer vision and

information technology. I. K. International publishing, New

Delhi, pp 542–551

9. Lehal GS, Singh C (2002) A complete OCR system for Gurmukhi

script. In: Lecture notes in computer science, vol 2396. Springer,

New York, pp 344–352

10. Taneja IJ (2006) Bounds on triangular discrimination, harmonic

mean and symmetric Chi square divergences. J Concrete Math

Appl Math 4:91–111

11. Maloo M, Kale KV (2011) Support vector machine based

Gujarati numeral recognition. Int J Comput Sci Eng (IJCSE)

3:2595–2600

12. Clowes MB, Parks JR (1961) A new technique in automatic

character recognition. Comput J 4(2):121–128

13. Cheriet M, Hharma N, Liu C-L, Sen CY (2007) Character rec-

ognition systems a guide for students and practitioners. Wiley,

New York

14. Bulacu M, Schomaker L, Brink A (2007) Text-independent writer

identification and verification on offline Arabic handwriting.

ICDAR IEEE Comput Soc II, 23–26:769–773

15. Dong P, Brankov JG, Galatsanos NP, Yang Y, Davoine F (2005)

Digital watermarking robust to geometric distortions. IEEE Trans

Image Process 14(12):2040–2050

16. Lam L, Lee S-W, Suen CY (1992) Thinning methodologies—a

comprehensive survey. IEEE Trans Pattern Anal Mach Intell

14(9):879

17. Chacko BP, Vimal Krishnan VR, Raju G, Babu Anto P (2012)

Handwritten character recognition using wavelet energy and

extreme learning machine. Int J Mach Learn Cybern 3(2):

149–161

18. Xie S, Shan S, Chen X, Chen J (2010) Fusing local patterns of

Gabor magnitude and phase for face recognition. IEEE Trans

Image Process 19(5):1349–1361

19. David W (2000) Jacobs, classification with nonmetric distances:

image retrieval and class representation. IEEE Trans Pattern Anal

Machine Intell 22(6):583–600

20. Scheirer WJ, Willber MJ, Eckmann M, Boult TE (2013) Good

recognition is non-metric. Comput Vision Pattern Recognit

21. Cha SH (2007) Comprehensive survey on distance/similarity

measures between probability density functions. Int J Math

Models Methods Appl Sci 1(4):300–307

22. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H,

McLachlan GJ, Ng A, Liu B, Yu PS, Zhou ZH, Steinbach M,

Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining.

J Knowl Inf Syst 14:1–37

23. Tomasev N, Radovanovic M, Mladenic D, Ivanovic M (2012)

Hubness-based fuzzy measures for high-dimensional k-nearest

neighbor classification. Int J Mach Learn Cybern. doi:10.1007/

s13042-012-0137-1

24. Jiang L, Cai Z, Wang D, Zhang H (2013) Bayesian citation-KNN

with distance weighting. Int J Mach Learn Cybern. doi:10.1007/

s13042-013-0152-x.

25. Dhurandhar A, Dobra A (2012) Probabilistic characterization of

nearest neighbor classifier. Int J Mach Learn Cybern. 2012.

doi:10.1007/s13042-012-0091-y.

26. Agarwal M, Ma H, Doermann D (2010) Online handwriting

recognition for Indic scripts. In: Advances in pattern recognitions,

pp 125–146

27. Neeba NV, Namboodiri A, Jawahar CV, Narayanan PJ (2010)

Recognition of Malayalam documents. In: Advances in pattern

recognition, pp 125–146

28. Mukhtar O, Setlur S, Govindaraju V (2010) Experiments in Urdu

text recognition. In: Guide, advances in pattern recognition,

pp 125–146

29. Natrajan P, MacRostie E, Decerbo M (2009) The BBN Byblos

Hindi OCR system. IEEE Trans Image Process 19(5):1349–1361

30. Shing J, Jang R (1993) ANFIS: adaptive-network-based fuzzy

inference system. IEEE Trans Man Cybern 23(3):665–686

31. Cetisli B, Barkana A (2009) Speeding up the scaled conjugate

gradient algorithm and its application in neuro-fuzzy classifier

training, In: Soft computing a fusion of foundations, methodol-

ogies and applications. Springer, Berlin, pp 365–378. doi:10.

1007/s00500-009-0410-8

32. Cetisli B (2010) Development of an adaptive neuro-fuzzy clas-

sifier using linguistic hedges: part 1. J Expert Syst Appl

37:6093–6101

33. Cetisli B (2010) The effect of linguistic hedges on feature

selection: part 2. Expert Syst Appl 37:6102–6108


123

http://dx.doi.org/10.1007/s13042-012-0137-1

http://dx.doi.org/10.1007/s13042-012-0137-1

http://dx.doi.org/10.1007/s13042-013-0152-x

http://dx.doi.org/10.1007/s13042-013-0152-x

http://dx.doi.org/10.1007/s13042-012-0091-y

http://dx.doi.org/10.1007/s00500-009-0410-8

http://dx.doi.org/10.1007/s00500-009-0410-8