ieeetbme

Upload: joao-sanches

Post on 05-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 IEEEtbme

    1/4

    1

    Robust Band Profile Extraction Algorithm, Using

    Constrained N-P Machine Learning TechniqueShadab Khan, Joao Sanches and Rodrigo Ventura

    AbstractPoor image quality, a typical characteristic of im-ages of bone marrow cells taken during cell division process,poses a challenging task of extraction of accurate band profilerepresentative of intensity distribution over chromosomes, hencenecessitating requirement of a robust method to tackle thisproblem. An algorithm was thus developed, which estimatesa single line medial axis, the basis for computation of bandprofile. Medial axis was generated by computing an ultimateprediction, using primary and secondary predictions obtainedby a nonparametric machine learning algorithm trained withdata from chromosomes skeleton, and geometrical propertiesof medial axis respectively. Experiments were performed using

    LK1dataset. The algorithm was found capable of estimatinga satisfactory single line medial axis. Band profile obtained was

    found to be accurate representative of intensity levels in differentregions of chromosomes. Additionally this algorithm was foundto be robust, as it was capable of growing a very small seedregion into desired medial axis and handled highly irregularchromosomes well.

    Index TermsMedial Axis, Discrete Curve Evolution, BandProfile, Biological Cells.

    I. INTRODUCTION

    IN the field of cytogenetic, Karyotype is the set offeatures that can be used to study taxonomic relation-ships,chromosomal aberrations or steps of evolution in thepast. Karyotyping is the procedure by which these studies

    can be carried out, manual procedure requires a considerable

    time of an expert, thus automating it is highly desirable. In

    this regard, classifier design often suffers due to measurement

    degradation in features, over which it relies for classification.

    Band profile is one such prominent feature, which has been

    used widely such as in [1][4]. A band profile extracted should

    be an accurate representative of the spatial distribution of

    intensity over regions of chromosome surface, for the classifier

    to be able to discriminate with high rate of classification. Thus

    an algorithm that can accurately extract band profiles for the

    chromosomes is essential.

    Although sufficient literature is already available to processimages from high quality dataset such as Copenhagen or

    Edinburgh, a satisfactory method to work on images of bone

    marrow cells, taken during mitosis, is still missing. Images

    of bone marrow cells suffer because the chromosomes are

    often distorted in shape with considerable blur and unclear

    edges. Close observation of [3, Fig. 6d] reveals that the

    lines orthogonal to medial axis, which were drawn for the

    S. Khan is with Manipal Inst. of Technology, India. [email protected] Ventura ([email protected]) and Joao Sanches ([email protected]) are

    with Institute for Systems and Robotics, Technical Superior Institute, Lisbon,Portugal.

    Fig. 1. A typical karyogram from LK1 dataset.

    computation of band profile were often intersecting in the

    regions which were not close to the boundary, which is highly

    undesirable as integrals of intensity values along these lines

    were used to compute band profile, thus counting contribution

    of same pixel multiple times. Jau Hong Kao et al. [4] proposed

    a method, which was better in terms of visually rendering

    medial axis than [3], but the problem due to unconstrained

    interpolation can be observed in [4, Fig. 4a]. Thus, a robust

    algorithm to adapt the changes in contour of chromosome was

    required.

    The algorithm proposed here starts with computation of me-dial axis. It does so by first training a nonparametric machine

    learning algorithm with training set taken from the skeleton of

    chromosome, using which a primary prediction is found out.

    Using the information available about the dependence of points

    on medial axis over contour of chromosome, a secondary

    prediction is computed as well. Lastly, using primary and

    secondary prediction, final prediction is computed, which is

    then appended to the part of skeleton, with which algorithm

    started (seed region). The algorithm continues recursively until

    a complete single line medial axis has been estimated and as

    a last step band profile is computed. The algorithm described

    is robust, computationally inexpensive in its performance and

    is capable of processing chromosomes with highly irregularcontour as well.

    I I . ALGORITHM DESCRIPTION

    In this work medial axis of a closed contour is defined as a

    single continuous curve transversing across the length of the

    contour, for simplifying further discussion. A formal definition

    of skeleton is described later. The complete algorithm to com-

    pute band profile can be divided in four major steps, which are:

    (1) Preprocessing of chromosome subimage; (2) Skeletonizing

    the chromosome subimage; (3) Seed Region growing; and (4)

  • 8/2/2019 IEEEtbme

    2/4

    2

    (a) (b) (c) (d) (e) (f) (g)

    1 11 11 11

    01 11110

    111 0000

    1 0000 0 0

    01 11110

    111 0000

    1 0000 0 0

    (h)

    Fig. 2. Chromosome Images. a) Chromosome skeleton marked white, b) Pruned skeleton, discrete curve evolved for chromosome marked blue, bifurcationpoints marked red, c) Seed region marked red, original boundary of chromosome marked white, d) Boundary obtained using DCE with 20 vertices e) Structuringmatrix, h) Original medial axis marked red, i) Smoothed medial axis, j) Seed region with 10 elements marked red, k) Grown medial axis from seed regionin Fig.3c

    Mesh laying & reconstruction of geometrically compensated

    image.

    A. Preprocessing of chromosome subimage

    Chromosome subimages are extracted one by one from an

    ordered karyogram image, prepared by experts. All subimages

    are then padded with additional rows and columns to aid infurther steps of skeletonization, and is then binarized using

    suitable threshold.

    B. Skeletonizing the chromosome subimage

    In this step, skeleton of the chromosome subimage is devel-

    oped. Due to noisy boundary edges of the chromosome, owing

    to poor quality of the images from LK1 dataset, several un-

    wanted branches of the skeleton of chromosome were observed

    which demanded pruning. Numerous methods for skeleton

    pruning exist, with their own advantages and disadvantages.

    The method used here has merit of fast computation and allows

    control over desired number of branches in the skeleton. Forthis purpose, skeleton pruning algorithm based on contour

    partitioning using discrete curve evolution (DCE) [?] was

    chosen, and is discussed here in brief. Skeleton pruning by

    contour partitioning is not poised to noise and produces a

    stable skeleton [?]. Few definitions are noted down here, to

    aid in explanation.

    It is assumed that a planer set D, is the closure of aconnected subset of R2 and its boundary D is constitutedby simple closed curves, which are analytic and polygonal.

    According to Blum [2] skeleton S(D) of a set D is the locusof the centre of a closed disk in D that touches D and is not asubset of any other disk in D. Thus formally skeleton pruning

    is defined as elimination of ss S(D) whose generatingpoints lie in the same open segment. T an(s) is the setconsisting of points of intersection of maximal disc centred

    at s and D . Degree of s, deg(s) is defined as cardinality ofT an(s).

    Since any closed digital curve can be assumed to represent

    a polygonal curve with many vertices (each pixel denoting a

    vertex), DCE returns a simpler representation of D throughfollowing method: Contour is divided into many segments

    with number of vertices equal to number of pixels on D(initially). Then, during every evolution step, two neighboring

    line segments s1, s2 are replaced by a single segment joining

    the endpoints of s1s2. The substitution is done based on therelevance measure K given by:

    K(s1, s2) =(s1, s2)l(s1)l(s2)

    l(s1) + l(s2)

    In the above formula, s1, s2 are line segments incident on acommon vertex v, (s1, s2) is the turn angle at v and l is the

    length function normalized with respect to the total length ofa polygonal curve. Properties of interest of K are [3] and [4].

    Higher value ofK(s1s2) signifies relatively higher contri-bution ofs1s2 towards the overall shape of the contour. Start-ing with an input polygon P, with n vertices, DCE succes-sively produces simpler polygons P = Pn1, Pn2, . . . , P 3

    such that Pnm is obtained by removing vertex v withsmallest K in Pn(m1). S(D) so obtained is then thinnedrecursively until one pixel wide skeleton is obtained.

    C. Seed Region Growing

    When DCE is forced to produce a skeleton with four

    endpoints, it also returns two bifurcation points. A set of

    bifurcation point is defined as {s S(D) : deg(s) 3}.A Seed Region as will be referred to here, is the region

    in S(D) that connects the two bifurcation points. The medialaxis M(D) for entire chromosome is obtained by growingthis seed region, till it includes two points from D, thusindicating the algorithm to stop expansion of seed region. For

    simplifying further discussion, seed region will be referred

    as M(D), since the emphasis is on growing seed region toproduce medial axis, this should not cause any confusion.

    Considering an image, M(D) will have two column vec-tors, X and Y to describe the locations of pixels constituting

    M(D). X always contains a series of numbers in increasingor decreasing order since M(D) obtained for a chromosomewith length greater then breadth will be spatially distributed

    along the length as shown in Fig. 1d M(D) = [X Y].A smoothed form of D is also obtained, the purpose of

    which will be clear in further discussion. Since the boundary

    of chromosome is very irregular, it is smoothed using a 3 step

    process:

    1) Image opening operation on the chromosome sub image,

    with a structuring element is performed as noted below:

    A B =

    z|Bz

    z|Bz A

    =

  • 8/2/2019 IEEEtbme

    3/4

    3

    where A is chromosome image and B is structuring

    element as shown in Fig. 1g.

    2) Discrete curve evolution with 20 vertices on the bound-

    ary.

    3) Discarding all other elements in D except for 20 ver-tices found using DCE, followed by connecting neigh-

    boring vertices to form a closed polygonal curve. This

    boundary will be referred to as D2.

    To grow M(D) is to include an extra elementEM(X(i), Y(i)). Considering the context of the work,the point to be included, p should be such that:

    1) If a curve C is defined to be representative of spatialdistribution of M(D) at EM, then a normal norm(p)to C on EM intersects D at points D1 and D2 suchthat D1 EM D

    2 EM where pt1 pt2

    operator denotes Euclidean distance between points pt1

    and pt2.2) For EM with coordinates (X(i), Y(i)) X(i) / X and

    EM lies close to endpoints of M(D) named EP1 andEP2. Here close implies EP(1or2) EM T

    where T is a threshold parameter to ensure that EM lies closeto either EP1 or EP2. This is an intuitive measure consideringthat predicted point lying too far away from EP1 or EP2might induce oscillations in C, which is unlikely considering

    the usual structure of chromosomes in the dataset or can be

    attributed to small protuberance on the boundary and should

    be ignored.

    Algorithm begins by finding the plausible candidate p,which will be referred to as primary prediction. This is done

    by first finding the next element in the sequence of X, and

    predicting p using a nonparametric machine learning algorithmnamely locally weighted linear regression. A training set S

    is formed by taking pn elements from X and Y, which lietowards the extremities of X and Y as shown in the Fig.3. In further discussion Sx and Sy will be used to denote

    input feature vector component ofS and target variable vector

    component of S respectively. A hypothesis h is then definedsuch that:

    h(x) = 0 + 1x

    h(x) is the function used to predict output values for inputx. s are the parameters for hypothesis h. For the purposeof constraining h(x) to attain values as described by vectorSy, a two-step process is performed.

    1) Fit to minimizepn

    j=0 W(j) {Sy(j) h(Sx(j))}2

    .

    2) Output the hypothesis h using TSx

    Here W(j)s are nonnegative valued weights, and T istranspose of vector of parameters . In the context of the work,the weight function used was a time-shifted function of a fairly

    standard model.

    W(j) = exp

    ||Sx(j) x| pn|

    22

    Here x is the query point and is bandwidth parameter usedto control the rate of decay of the exponential function. Shift

    introduced this way allows for points lying away from the

    boundary elements of M(D) to have higher weight. Thisis a valid operation since during subsequent iteration of the

    algorithm to grow M(D), the elements lying away from theendpoints EP1 or EP2 are more likely to have been a partof the seed region. Next, using the hypothesis function h(x),for the x coordinate of p as input, y coordinate is predictedand a normal norm(p) to C at p is found out. Next step is todecide if p meets the criteria for EM, if not then a secondaryprediction is computed:

    1) Let norm(p) D = {D1, D2}2) Calculate D1 p and D1 p 3) If D1 p D2 p 0.8 then prediction is

    reliable, if not then then calculate p2 such that p2 =(D1 + D2) 0.5

    4) Ifp2 is located such that p2 p T1 then,

    EM =

    Wt1 p + Wt2 p2

    Wt1 + Wt2

    where T1 is a threshold to either validate or discard a sec-ondary observation p2. Wt1 and Wt2 are the weights assignedto p and p2 for finding out EM. T1 also aids in identifyingsmall protuberances and irregularities in the contour, which do

    not contribute significantly in the topology of chromosome.Depending on the shape of the chromosome, and the variation

    along the boundary, it might be desirable to chose the element

    to be included in M(D) depending on p or p2. This allowscontrol over how fast the slope of C at EM will change.Finally, M(D) is updated with an extra element EM addedbeyond EP1 or EP2 depending on whether the growing wasperformed on upper half or lower half of the chromosome

    subimage. The algorithm then continues next iteration, this

    time with EM obtained in last iteration being EP1 or EP2.The process terminates when EM is found such that it isbeyond the boundary of chromosome, thus resulting in a single

    line medial axis M(D).

    D. Mesh laying & geometrically compensated image

    After an accurate medial axis has been obtained, next

    step is geometrical compensation of the image (for feature

    extraction). For this purpose, chromosome was assumed to

    be a two dimensional deformable surface. To capture the

    geospatial distribution of intensity level, and preserve them

    for an accurate shape reconstruction, a mesh was distributed

    on the chromosome. This mesh was then transformed to recon-

    struct the chromosome shape. Mesh laying over chromosome

    and geometrical compensation of chromosome subimage was

    performed through the following steps:1) Smoothing of medial axis using successive cubic splines,

    first with knots at interval of 3, then 4 and finally 8.

    Fig.5.

    2) Draw orthogonal line orth(s)s M(D), and find setIS IS = {orth(s) D}

    3) Extract the profile of chromosome subimage lying be-

    tween the elements of IS , and transport it into a newimage to reconstruct geometrically compensated chro-

    mosome subimage.

    Band profile for the chromosome subimage is then computed

    as the average intensity values across each row H(n) of the

  • 8/2/2019 IEEEtbme

    4/4

    4

    (a) (b)

    Fig. 3. Image displaying chromosomes from class 2 along with their band profiles, correspondence can be observed.

    shape compensated image.

    H(n) =

    1

    N

    Ni=1

    I(n, i)

    here N is the number of columns in the shape compensatedchromosome subimage and I is the intensity of a pixel.

    III. RESULTS AND DISCUSSION

    Fig. 4 and Fig. 5 show the effectiveness of algorithm in

    finding a smooth medial axis, which can be used to compute

    band profile. In the context of karyotyping, it is important that

    orth(s)s M(D) should be nonintersecting. This objectivewas met as can be confirmed by visual inspection of Fig. 5.

    Figure 5c and Fig. 5d prove the robustness of the algorithm in

    accurate prediction of elements to be included in medial axis,

    as it was able to grow seed region with 10 elements (Fig. 5c)

    into M(D) with 60 elements (Fig. 5d).Figure 7 shows the band profiles for a pair of chromosomes

    from class 2. The correspondences between the band profiles

    can be confirmed visually and by considering the normalized

    correlation between them.

    IV. FUTURE WOR K

    An accurate algorithm to compute medial axis of chromo-

    somes from LK1 dataset is very motivating, thus it was of ourinterest to see the pairing results based only on band profiles

    of chromosomes, although it was known that it is not possible

    to have accurate results using only band profiles. To avoid

    measurement degradation during the comparison, the two band

    profiles were first aligned by estimating a shift constant , suchthat,

    = arg [max {((h1(n), h2(n ))}]

    where (x, y) = xTy is the correlation function of thecolumn vectors x and y and h1(n) and h2(n) are the bandprofiles of 1st and 2nd chromosomes of the same class. Theoptimization is performed by testing the correlation function

    for = {10, . . . , 10} and choosing the one that maximizesit. The distance between the chromosomes is the Euclidean

    distance between one profile and the other, shifted by .

    d(i, j) = hi(n) hj(n )2

    with experiments on LK1 dataset, on the average, 10 out of22 times algorithm was able to pair chromosomes from same

    class to each other perfectly.

    V. CONCLUSION

    In this paper an algorithm to accurately extract the band pro-

    file of chromosomes was presented. The algorithm was found

    robust in its operation and is a computationally inexpensive

    model to be implemented. Having this vital tool for automatic

    karyotyping purpose, from here, it is intended to develop an

    accurate chromosome pairing system, using additional features

    such as geometrical properties of chromosomes along with

    statistical data.

    REFERENCES

    [1] J. Piper and E. Granum, On fully automatic feature measurement forbanded chromosome classification, Cytometry, no. 10, pp. 242-255,

    1989.[2] J. R. Stanley, M, J. Keller, P. Gader, and W. C. Caldwell, Datadriven

    homologue matching for chromosome identification, IEEE Trans. onMed. Imag., vol. 17, no. 3, pp. 451-462, 1998.

    [3] A. Khmelinskii, R. Ventura and J. Sanches, A Novel Metric for BoneMarrow Cells Chromosome Pairing, IEEE Trans. on Biomed. Eng.,to bepublished.

    [4] J. H. Kao, J. H. Chuang and T. P. Wang, Automatic Chromosome Classi-fication Using Medial Axis Approximation and Band Profile Similarity,

    LNCS, vol. 3852, P.J. Narayanan et al., Eds. Berlin: Springer Verlag,2006, pp. 274-283.

    [5] X. Bai, L.J. Latecki and W. Y. Liu, Skeleton Pruning by ContourPartitioning with Discrete Curve Evolution, IEEE Trans. on Pattern Anal.and Mach. Intell., vol.29, no.3, pp.449-462, March 2007.

    [6] H. Blum, Biological Shape and Visual Science (Part I), J.TheoreticalBiology, vol. 38, pp. 205-287, 1973.

    [7] L.J. Latecki and R. Lakamper, Shape Similarity Measure based on

    Correspondence of Visual Parts, IEEE Trans. on Pattern Anal. and Mach.Intell., vol. 22, no. 10, pp. 1185-1190, Oct. 2000.

    [8] L.J. Latecki and R. Lakamper, Application of Planar Shape Comparisonto Object Retrieval in Image Databases, Pattern Recognition, vol. 35, no.1, pp. 15-29, 2002.