ieeetbme
TRANSCRIPT
-
8/2/2019 IEEEtbme
1/4
1
Robust Band Profile Extraction Algorithm, Using
Constrained N-P Machine Learning TechniqueShadab Khan, Joao Sanches and Rodrigo Ventura
AbstractPoor image quality, a typical characteristic of im-ages of bone marrow cells taken during cell division process,poses a challenging task of extraction of accurate band profilerepresentative of intensity distribution over chromosomes, hencenecessitating requirement of a robust method to tackle thisproblem. An algorithm was thus developed, which estimatesa single line medial axis, the basis for computation of bandprofile. Medial axis was generated by computing an ultimateprediction, using primary and secondary predictions obtainedby a nonparametric machine learning algorithm trained withdata from chromosomes skeleton, and geometrical propertiesof medial axis respectively. Experiments were performed using
LK1dataset. The algorithm was found capable of estimatinga satisfactory single line medial axis. Band profile obtained was
found to be accurate representative of intensity levels in differentregions of chromosomes. Additionally this algorithm was foundto be robust, as it was capable of growing a very small seedregion into desired medial axis and handled highly irregularchromosomes well.
Index TermsMedial Axis, Discrete Curve Evolution, BandProfile, Biological Cells.
I. INTRODUCTION
IN the field of cytogenetic, Karyotype is the set offeatures that can be used to study taxonomic relation-ships,chromosomal aberrations or steps of evolution in thepast. Karyotyping is the procedure by which these studies
can be carried out, manual procedure requires a considerable
time of an expert, thus automating it is highly desirable. In
this regard, classifier design often suffers due to measurement
degradation in features, over which it relies for classification.
Band profile is one such prominent feature, which has been
used widely such as in [1][4]. A band profile extracted should
be an accurate representative of the spatial distribution of
intensity over regions of chromosome surface, for the classifier
to be able to discriminate with high rate of classification. Thus
an algorithm that can accurately extract band profiles for the
chromosomes is essential.
Although sufficient literature is already available to processimages from high quality dataset such as Copenhagen or
Edinburgh, a satisfactory method to work on images of bone
marrow cells, taken during mitosis, is still missing. Images
of bone marrow cells suffer because the chromosomes are
often distorted in shape with considerable blur and unclear
edges. Close observation of [3, Fig. 6d] reveals that the
lines orthogonal to medial axis, which were drawn for the
S. Khan is with Manipal Inst. of Technology, India. [email protected] Ventura ([email protected]) and Joao Sanches ([email protected]) are
with Institute for Systems and Robotics, Technical Superior Institute, Lisbon,Portugal.
Fig. 1. A typical karyogram from LK1 dataset.
computation of band profile were often intersecting in the
regions which were not close to the boundary, which is highly
undesirable as integrals of intensity values along these lines
were used to compute band profile, thus counting contribution
of same pixel multiple times. Jau Hong Kao et al. [4] proposed
a method, which was better in terms of visually rendering
medial axis than [3], but the problem due to unconstrained
interpolation can be observed in [4, Fig. 4a]. Thus, a robust
algorithm to adapt the changes in contour of chromosome was
required.
The algorithm proposed here starts with computation of me-dial axis. It does so by first training a nonparametric machine
learning algorithm with training set taken from the skeleton of
chromosome, using which a primary prediction is found out.
Using the information available about the dependence of points
on medial axis over contour of chromosome, a secondary
prediction is computed as well. Lastly, using primary and
secondary prediction, final prediction is computed, which is
then appended to the part of skeleton, with which algorithm
started (seed region). The algorithm continues recursively until
a complete single line medial axis has been estimated and as
a last step band profile is computed. The algorithm described
is robust, computationally inexpensive in its performance and
is capable of processing chromosomes with highly irregularcontour as well.
I I . ALGORITHM DESCRIPTION
In this work medial axis of a closed contour is defined as a
single continuous curve transversing across the length of the
contour, for simplifying further discussion. A formal definition
of skeleton is described later. The complete algorithm to com-
pute band profile can be divided in four major steps, which are:
(1) Preprocessing of chromosome subimage; (2) Skeletonizing
the chromosome subimage; (3) Seed Region growing; and (4)
-
8/2/2019 IEEEtbme
2/4
2
(a) (b) (c) (d) (e) (f) (g)
1 11 11 11
01 11110
111 0000
1 0000 0 0
01 11110
111 0000
1 0000 0 0
(h)
Fig. 2. Chromosome Images. a) Chromosome skeleton marked white, b) Pruned skeleton, discrete curve evolved for chromosome marked blue, bifurcationpoints marked red, c) Seed region marked red, original boundary of chromosome marked white, d) Boundary obtained using DCE with 20 vertices e) Structuringmatrix, h) Original medial axis marked red, i) Smoothed medial axis, j) Seed region with 10 elements marked red, k) Grown medial axis from seed regionin Fig.3c
Mesh laying & reconstruction of geometrically compensated
image.
A. Preprocessing of chromosome subimage
Chromosome subimages are extracted one by one from an
ordered karyogram image, prepared by experts. All subimages
are then padded with additional rows and columns to aid infurther steps of skeletonization, and is then binarized using
suitable threshold.
B. Skeletonizing the chromosome subimage
In this step, skeleton of the chromosome subimage is devel-
oped. Due to noisy boundary edges of the chromosome, owing
to poor quality of the images from LK1 dataset, several un-
wanted branches of the skeleton of chromosome were observed
which demanded pruning. Numerous methods for skeleton
pruning exist, with their own advantages and disadvantages.
The method used here has merit of fast computation and allows
control over desired number of branches in the skeleton. Forthis purpose, skeleton pruning algorithm based on contour
partitioning using discrete curve evolution (DCE) [?] was
chosen, and is discussed here in brief. Skeleton pruning by
contour partitioning is not poised to noise and produces a
stable skeleton [?]. Few definitions are noted down here, to
aid in explanation.
It is assumed that a planer set D, is the closure of aconnected subset of R2 and its boundary D is constitutedby simple closed curves, which are analytic and polygonal.
According to Blum [2] skeleton S(D) of a set D is the locusof the centre of a closed disk in D that touches D and is not asubset of any other disk in D. Thus formally skeleton pruning
is defined as elimination of ss S(D) whose generatingpoints lie in the same open segment. T an(s) is the setconsisting of points of intersection of maximal disc centred
at s and D . Degree of s, deg(s) is defined as cardinality ofT an(s).
Since any closed digital curve can be assumed to represent
a polygonal curve with many vertices (each pixel denoting a
vertex), DCE returns a simpler representation of D throughfollowing method: Contour is divided into many segments
with number of vertices equal to number of pixels on D(initially). Then, during every evolution step, two neighboring
line segments s1, s2 are replaced by a single segment joining
the endpoints of s1s2. The substitution is done based on therelevance measure K given by:
K(s1, s2) =(s1, s2)l(s1)l(s2)
l(s1) + l(s2)
In the above formula, s1, s2 are line segments incident on acommon vertex v, (s1, s2) is the turn angle at v and l is the
length function normalized with respect to the total length ofa polygonal curve. Properties of interest of K are [3] and [4].
Higher value ofK(s1s2) signifies relatively higher contri-bution ofs1s2 towards the overall shape of the contour. Start-ing with an input polygon P, with n vertices, DCE succes-sively produces simpler polygons P = Pn1, Pn2, . . . , P 3
such that Pnm is obtained by removing vertex v withsmallest K in Pn(m1). S(D) so obtained is then thinnedrecursively until one pixel wide skeleton is obtained.
C. Seed Region Growing
When DCE is forced to produce a skeleton with four
endpoints, it also returns two bifurcation points. A set of
bifurcation point is defined as {s S(D) : deg(s) 3}.A Seed Region as will be referred to here, is the region
in S(D) that connects the two bifurcation points. The medialaxis M(D) for entire chromosome is obtained by growingthis seed region, till it includes two points from D, thusindicating the algorithm to stop expansion of seed region. For
simplifying further discussion, seed region will be referred
as M(D), since the emphasis is on growing seed region toproduce medial axis, this should not cause any confusion.
Considering an image, M(D) will have two column vec-tors, X and Y to describe the locations of pixels constituting
M(D). X always contains a series of numbers in increasingor decreasing order since M(D) obtained for a chromosomewith length greater then breadth will be spatially distributed
along the length as shown in Fig. 1d M(D) = [X Y].A smoothed form of D is also obtained, the purpose of
which will be clear in further discussion. Since the boundary
of chromosome is very irregular, it is smoothed using a 3 step
process:
1) Image opening operation on the chromosome sub image,
with a structuring element is performed as noted below:
A B =
z|Bz
z|Bz A
=
-
8/2/2019 IEEEtbme
3/4
3
where A is chromosome image and B is structuring
element as shown in Fig. 1g.
2) Discrete curve evolution with 20 vertices on the bound-
ary.
3) Discarding all other elements in D except for 20 ver-tices found using DCE, followed by connecting neigh-
boring vertices to form a closed polygonal curve. This
boundary will be referred to as D2.
To grow M(D) is to include an extra elementEM(X(i), Y(i)). Considering the context of the work,the point to be included, p should be such that:
1) If a curve C is defined to be representative of spatialdistribution of M(D) at EM, then a normal norm(p)to C on EM intersects D at points D1 and D2 suchthat D1 EM D
2 EM where pt1 pt2
operator denotes Euclidean distance between points pt1
and pt2.2) For EM with coordinates (X(i), Y(i)) X(i) / X and
EM lies close to endpoints of M(D) named EP1 andEP2. Here close implies EP(1or2) EM T
where T is a threshold parameter to ensure that EM lies closeto either EP1 or EP2. This is an intuitive measure consideringthat predicted point lying too far away from EP1 or EP2might induce oscillations in C, which is unlikely considering
the usual structure of chromosomes in the dataset or can be
attributed to small protuberance on the boundary and should
be ignored.
Algorithm begins by finding the plausible candidate p,which will be referred to as primary prediction. This is done
by first finding the next element in the sequence of X, and
predicting p using a nonparametric machine learning algorithmnamely locally weighted linear regression. A training set S
is formed by taking pn elements from X and Y, which lietowards the extremities of X and Y as shown in the Fig.3. In further discussion Sx and Sy will be used to denote
input feature vector component ofS and target variable vector
component of S respectively. A hypothesis h is then definedsuch that:
h(x) = 0 + 1x
h(x) is the function used to predict output values for inputx. s are the parameters for hypothesis h. For the purposeof constraining h(x) to attain values as described by vectorSy, a two-step process is performed.
1) Fit to minimizepn
j=0 W(j) {Sy(j) h(Sx(j))}2
.
2) Output the hypothesis h using TSx
Here W(j)s are nonnegative valued weights, and T istranspose of vector of parameters . In the context of the work,the weight function used was a time-shifted function of a fairly
standard model.
W(j) = exp
||Sx(j) x| pn|
22
Here x is the query point and is bandwidth parameter usedto control the rate of decay of the exponential function. Shift
introduced this way allows for points lying away from the
boundary elements of M(D) to have higher weight. Thisis a valid operation since during subsequent iteration of the
algorithm to grow M(D), the elements lying away from theendpoints EP1 or EP2 are more likely to have been a partof the seed region. Next, using the hypothesis function h(x),for the x coordinate of p as input, y coordinate is predictedand a normal norm(p) to C at p is found out. Next step is todecide if p meets the criteria for EM, if not then a secondaryprediction is computed:
1) Let norm(p) D = {D1, D2}2) Calculate D1 p and D1 p 3) If D1 p D2 p 0.8 then prediction is
reliable, if not then then calculate p2 such that p2 =(D1 + D2) 0.5
4) Ifp2 is located such that p2 p T1 then,
EM =
Wt1 p + Wt2 p2
Wt1 + Wt2
where T1 is a threshold to either validate or discard a sec-ondary observation p2. Wt1 and Wt2 are the weights assignedto p and p2 for finding out EM. T1 also aids in identifyingsmall protuberances and irregularities in the contour, which do
not contribute significantly in the topology of chromosome.Depending on the shape of the chromosome, and the variation
along the boundary, it might be desirable to chose the element
to be included in M(D) depending on p or p2. This allowscontrol over how fast the slope of C at EM will change.Finally, M(D) is updated with an extra element EM addedbeyond EP1 or EP2 depending on whether the growing wasperformed on upper half or lower half of the chromosome
subimage. The algorithm then continues next iteration, this
time with EM obtained in last iteration being EP1 or EP2.The process terminates when EM is found such that it isbeyond the boundary of chromosome, thus resulting in a single
line medial axis M(D).
D. Mesh laying & geometrically compensated image
After an accurate medial axis has been obtained, next
step is geometrical compensation of the image (for feature
extraction). For this purpose, chromosome was assumed to
be a two dimensional deformable surface. To capture the
geospatial distribution of intensity level, and preserve them
for an accurate shape reconstruction, a mesh was distributed
on the chromosome. This mesh was then transformed to recon-
struct the chromosome shape. Mesh laying over chromosome
and geometrical compensation of chromosome subimage was
performed through the following steps:1) Smoothing of medial axis using successive cubic splines,
first with knots at interval of 3, then 4 and finally 8.
Fig.5.
2) Draw orthogonal line orth(s)s M(D), and find setIS IS = {orth(s) D}
3) Extract the profile of chromosome subimage lying be-
tween the elements of IS , and transport it into a newimage to reconstruct geometrically compensated chro-
mosome subimage.
Band profile for the chromosome subimage is then computed
as the average intensity values across each row H(n) of the
-
8/2/2019 IEEEtbme
4/4
4
(a) (b)
Fig. 3. Image displaying chromosomes from class 2 along with their band profiles, correspondence can be observed.
shape compensated image.
H(n) =
1
N
Ni=1
I(n, i)
here N is the number of columns in the shape compensatedchromosome subimage and I is the intensity of a pixel.
III. RESULTS AND DISCUSSION
Fig. 4 and Fig. 5 show the effectiveness of algorithm in
finding a smooth medial axis, which can be used to compute
band profile. In the context of karyotyping, it is important that
orth(s)s M(D) should be nonintersecting. This objectivewas met as can be confirmed by visual inspection of Fig. 5.
Figure 5c and Fig. 5d prove the robustness of the algorithm in
accurate prediction of elements to be included in medial axis,
as it was able to grow seed region with 10 elements (Fig. 5c)
into M(D) with 60 elements (Fig. 5d).Figure 7 shows the band profiles for a pair of chromosomes
from class 2. The correspondences between the band profiles
can be confirmed visually and by considering the normalized
correlation between them.
IV. FUTURE WOR K
An accurate algorithm to compute medial axis of chromo-
somes from LK1 dataset is very motivating, thus it was of ourinterest to see the pairing results based only on band profiles
of chromosomes, although it was known that it is not possible
to have accurate results using only band profiles. To avoid
measurement degradation during the comparison, the two band
profiles were first aligned by estimating a shift constant , suchthat,
= arg [max {((h1(n), h2(n ))}]
where (x, y) = xTy is the correlation function of thecolumn vectors x and y and h1(n) and h2(n) are the bandprofiles of 1st and 2nd chromosomes of the same class. Theoptimization is performed by testing the correlation function
for = {10, . . . , 10} and choosing the one that maximizesit. The distance between the chromosomes is the Euclidean
distance between one profile and the other, shifted by .
d(i, j) = hi(n) hj(n )2
with experiments on LK1 dataset, on the average, 10 out of22 times algorithm was able to pair chromosomes from same
class to each other perfectly.
V. CONCLUSION
In this paper an algorithm to accurately extract the band pro-
file of chromosomes was presented. The algorithm was found
robust in its operation and is a computationally inexpensive
model to be implemented. Having this vital tool for automatic
karyotyping purpose, from here, it is intended to develop an
accurate chromosome pairing system, using additional features
such as geometrical properties of chromosomes along with
statistical data.
REFERENCES
[1] J. Piper and E. Granum, On fully automatic feature measurement forbanded chromosome classification, Cytometry, no. 10, pp. 242-255,
1989.[2] J. R. Stanley, M, J. Keller, P. Gader, and W. C. Caldwell, Datadriven
homologue matching for chromosome identification, IEEE Trans. onMed. Imag., vol. 17, no. 3, pp. 451-462, 1998.
[3] A. Khmelinskii, R. Ventura and J. Sanches, A Novel Metric for BoneMarrow Cells Chromosome Pairing, IEEE Trans. on Biomed. Eng.,to bepublished.
[4] J. H. Kao, J. H. Chuang and T. P. Wang, Automatic Chromosome Classi-fication Using Medial Axis Approximation and Band Profile Similarity,
LNCS, vol. 3852, P.J. Narayanan et al., Eds. Berlin: Springer Verlag,2006, pp. 274-283.
[5] X. Bai, L.J. Latecki and W. Y. Liu, Skeleton Pruning by ContourPartitioning with Discrete Curve Evolution, IEEE Trans. on Pattern Anal.and Mach. Intell., vol.29, no.3, pp.449-462, March 2007.
[6] H. Blum, Biological Shape and Visual Science (Part I), J.TheoreticalBiology, vol. 38, pp. 205-287, 1973.
[7] L.J. Latecki and R. Lakamper, Shape Similarity Measure based on
Correspondence of Visual Parts, IEEE Trans. on Pattern Anal. and Mach.Intell., vol. 22, no. 10, pp. 1185-1190, Oct. 2000.
[8] L.J. Latecki and R. Lakamper, Application of Planar Shape Comparisonto Object Retrieval in Image Databases, Pattern Recognition, vol. 35, no.1, pp. 15-29, 2002.