a novel automated microscopy platform for multiresolution multispectral early detection of lung...

10

Click here to load reader

Upload: carlos

Post on 18-Feb-2017

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: A Novel Automated Microscopy Platform for Multiresolution Multispectral Early Detection of Lung Cancer Cells in Bronchoalveolar Lavage Samples

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE SYSTEMS JOURNAL 1

A Novel Automated Microscopy Platform forMultiresolution Multispectral Early Detection

of Lung Cancer Cells in BronchoalveolarLavage Samples

Thomas Pengo, Arrate Muñoz-Barrutía, Senior Member, IEEE, and Carlos Ortiz-de-Solórzano, Senior Member, IEEE

Abstract—Lung cancer is the deadliest form of cancer mainlybecause of the absence of reliable early diagnostic protocols.Therefore, there is increasing interest in the development of noveldiagnostic noninvasive technologies that may improve the earlydetection of the disease. Bronchoscope-guided bronchoalveolarlavage (BAL) is a minimally invasive diagnostic technique thatis based on the extraction and analysis of cellular material fromthe bronchial epithelium of patients that present suspicious lungmasses on low-dose screening X-ray-computed tomography im-ages. Together with a novel staining technique that combinesimmunophenotyping of a lung cancer biomarker with fluorescentin situ hybridization of genetically abnormal DNA loci, BALpromises a powerful early diagnostic tool for lung carcinomas.The sensitivity of this method, however, is highly dependent on thepathologist’s ability to reliably and repeatedly examine thousandsof cells under the microscope. This is an extremely labor-intensiveand error-prone task. We have developed a multiscale multidimen-sional integrated microscopy computer-aided detection platformthat autonomously scans and analyzes BAL samples. In this paper,we describe its software architecture and validate the specificimage analysis protocols that are developed for this particularapplication.

Index Terms—Automated microscopy, computer-aided detec-tion (CAD), fluorescent in situ hybridization (FISH), fluorescencemicroscopy, immunofluorescence, lung cancer, minimal samples.

Manuscript received July 27, 2012; revised March 13, 2013; acceptedSeptember 23, 2013. This work was supported in part by the UTE ProjectCenter for Applied Medical Research (CIMA), by the Spanish Science andInnovation Ministry (MICINN) through their subprogram for unique strategicprojects under Grant MICINN PSE SINBAD and Grant PSS 0100000-2008-2and through their program for nonfundamentally directed research grants underGrant MCYT TEC2005-04732 and Grant MICINN DPI2009-14115-C03-03.The works of A. Muñoz-Barrutía and C. Ortiz-de-Solórzano were supported bythe MICINN under the Ramón y Cajal Fellowship.

T. Pengo was with the Center for Applied Medical Research (CIMA), Uni-versity of Navarra, 31008 Pamplona, Spain. He is now with the Advanced LightMicroscopy Unit, Center for Genomic Regulation (CRG), 08003 Barcelona,Spain.

A. Muñoz-Barrutía and C. Ortiz-de-Solórzano are with the Center forApplied Medical Research (CIMA), University of Navarra, 31008 Pamplona,Spain, and also with the School of Engineering TECNUN, University ofNavarra, 20018 San Sebastián, Spain.

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSYST.2013.2289152

I. INTRODUCTION

LUNG cancer is one of the leading causes of mortalityworldwide, and it is the most common cause of death

from cancer in men and the second most common in women[1]. The prognosis of more than 80% of patients with lungcancer is dismal mostly because the advanced stage at the timeof diagnosis precludes curative surgery. In 2005, the five-yearsurvival rates for men and women diagnosed with lung cancerwere 13.6% and 17.2%, respectively [2]. Therefore, more thanany other type of cancer, it could benefit from the developmentof novel automated computer-aided detection (CAD) tools thatmay help in accelerating the diagnosis and, consequently, inimproving the very low survival rate of the patients.

Bronchoalveolar lavage (BAL) is a diagnostic method that issuitable for the detection of asymptomatic lung cancer [3] inpatients that present suspicious masses in chest X-Ray or com-puted tomography screening images. To obtain BAL samples, abronchoscope is placed into the bronchi, and a sterile solutionis inserted into the airways, which is next collected for mi-croscopic examination. The sample contains secretions, cells,soluble proteins, lipids, and other chemical constituents fromthe epithelial surface of the lower respiratory tract. In patientswith lung cancer, it may also contain a few cancer cells that areexfoliated from the tumors that are located or crossed by theairways, together with a large proportion of nontumoral epithe-lial and nonepithelial cells. Commonly, the extracted solution isGiemsa or Papanicolau stained and is analyzed by standard cy-tology. This method produces a high number of false positives[4] and thus is not commonly accepted for routine diagnosis.

To improve the sensitivity of the detection, BAL samples canbe fluorescently stained using the Fluorescence Immunophe-notyping and Interphase Cytogenetics for the Investigationof Neoplasms (FICTION) protocol. FICTION combines theimmunofluorescent detection of the heterogeneous nuclear ri-bonucleoprotein (hnRNP) A1 [5], [6], which is commonlyoverexpressed in lung cancer cells, with the fluorescent in situhybridization (FISH) labeling of the chromosome regions,which are commonly altered in lung cancer. The presenceof the immunofluorescent protein is used to select candidatecancer cells. The genomic analysis (i.e., the study of gene copynumber) that is based on the four labeled DNA probes is thenused to decide whether a candidate cell is indeed a cancer cell or

1932-8184 © 2013 IEEE

Page 2: A Novel Automated Microscopy Platform for Multiresolution Multispectral Early Detection of Lung Cancer Cells in Bronchoalveolar Lavage Samples

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE SYSTEMS JOURNAL

Fig. 1. Image of a cancer cell stained with FICTION. Antibody for the nuclearprotein hnRNPA1 (in blue) and the four DNA probes in LAVysion: one cen-tromeric probe for chromosome 6 in SpectrumAqua, 5p15.2 in SpectrumGreen,c-MYC in SpectrumGold, and 7p12 in SpectrumRed.

a false positive (see Fig. 1). This is visually done starting witha low-magnification 2-D screening of the immunofluorescenctcell, followed by a high-magnification analysis of the numericalgenomic alterations of the selected cells in 3-D (i.e., changingthe focal plane to enumerate all the DNA probes within thenucleus that is being analyzed).

Visually analyzing BAL samples is an enormously time-consuming and error-prone task. To address this issue, we havedeveloped an automated microscopy system that autonomouslyscans the BAL samples that are stained with FICTION. Our sys-tem seamlessly integrates the multispectral and multiresolutionacquisition and analysis of BAL samples, and it takes care ofthe storage of the images and analysis results. Specifically, thesystem acquires multiple planes in several spectral channels,being able to autonomously guide the acquisition of an entireslide. Since the focal plane varies along the slide, automatedfocusing is needed to keep the images sharp. Furthermore, thesystem can image objects at one magnification and find themat a later time using a higher magnification objective lens. Theimages are stored, and they remain accessible across a network.To provide consistency, the relationship between the imagesthat are taken from a slide and the objects that are contained inthem is maintained at each stage of the process. Regarding theanalysis, the system detects positive immunofluorescence cellsat the initial low-magnification scan. Then, after the secondhigh-magnification scan, those cells are segmented in 3-D alongwith the FISH signals that label the genes of interest. Toimprove time performance, the analysis is done in parallel withthe acquisition of new images. All image analysis routines,which are the 2-D and 3-D routines, have been integratedwith the acquisition software. Additionally, 3-D computationalsectioning is provided through the integration of deconvolutionroutines. Finally, the system has an open and modular design;thus, it is able to accommodate new functionalities withoutmajor reengineering.

To the best of our knowledge, no commercially availablesystem seamlessly integrates all three key tasks, i.e., acquisi-tion, analysis, and storage, as required in this application andas described earlier. Thus, we have developed such a systemby combining in-house with both proprietary software andfreeware software.

In this paper, we describe and evaluate the performance ofour system. This paper is organized as follows. Section II divesinto the details of the implementation of our current system,i.e., the hardware setup and image processing. Section III showsstep-by-step the performance of the system for our particularapplication. Finally, Section IV highlights some relevant partsof this paper and outlines future developments.

II. MATERIAL AND METHODS

A. Synthetic and Real Images

1) Synthetic Images: We created 100 synthetic 256 × 256three-channel images containing four round objects each, sim-ulating cell nuclei. Two nuclei are mildly autofluorescent, i.e.,one nucleus displays strong autofluorescence and the other isa stained nucleus with a strong emission in the blue channel.Autofluorescent nuclei are assigned similar intensities in allchannels.

There are ten parameters that must be selected to create eachimage:

1) the emission intensity of strong autofluorescent nucleus(one parameter);

2) the emission intensity of the two mild autofluorescentnuclei (one parameter);

3) the emission intensities of the three spectral channels ofthe stained nucleus (three parameters);

4) the transmittance of the three emission filters, cor-responding to the three acquisition channels (threeparameters);

5) the amount of additive noise (one parameter);6) the amount of multiplicative noise level (one parameter).

Each parameter is assigned an interval of values thus defininga 10-D space. The set of parameter combinations is chosen andis uniformly distributed in the parameter space.

2) Real Images: Sample Preparation: The samples that areused to test our system were the BAL from the subjectswith no signs or family history of lung cancer. The sampleswere sprinkled with cancer cells from two established humanadenocarcinoma cell lines (A549 and H460) and were stainedwith FICTION. First, we immunostained the sample with anantibody for the nuclear protein hnRNPA1, which is highlyexpressed in nearly all lung cancers but only at basal levelsin normal lung epithelia. The nuclei were counterstained usingTO-PRO-3. The samples were also labeled with the LaVysionkit, which contains the four FISH probes that target threecommon loci of genetic alterations in lung cancer (5p15.2,in SpectrumGreen; c-MYC, in SpectrumGold; and 7p12, inSpectrumRed), along with a centromeric probe (6c, in Spec-trumAqua). (See Fig. 1 for an example of the cancer cell that isstained with FICTION.)

We used normal human lymphocytes as control samples.

B. Hardware

Our system is based on an AxioPlan2ie microscope (Zeiss,Wetzlar, Germany), which is equipped with an automated MAC5000 scanning stage and robotic slide-loader (Ludl, Hawthorne,

Page 3: A Novel Automated Microscopy Platform for Multiresolution Multispectral Early Detection of Lung Cancer Cells in Bronchoalveolar Lavage Samples

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

PENGO et al.: PLATFORM FOR DETECTION OF LUNG CANCER CELLS IN BAL SAMPLES 3

TABLE IFILTER CUBES

TABLE IIFLUOROCHROMES AND CORRESPONDING FILTER CUBES

Fig. 2. Platform architecture. Each solid box represents a different module.Solid lines indicate the interactions between modules or between one moduleand the underlying hardware. The color filling the boxes represents the pro-gramming language that was used to write the code. The two larger boxeswith dotted lines represent the workstation that is connected to the microscope(WS1) and the workstation that is dedicated to the deconvolution, and imagestorage and analysis (WS2).

NY, USA). The camera is a Photometrics CoolSnap (RoperScientific, Tucson, AZ, USA). All images were acquired withtwo objective lenses, i.e., a 20 × 0.75 Numerical Aperture (NA)Plan Apochromat and a 40 × 0.95 NA Plan Apochromat, bothby Zeiss. A summary of the excitation and emission bands ofthe six filter sets that are used can be found in Table I. Thesefilters were chosen to fit the spectra of the fluorochromes thatare used in our application (see Table II).

C. Software Architecture

The control software was developed entirely in-house, start-ing from the basic device drivers to the high-level controloperations. Fig. 2 shows the architecture and organization ofthe main software modules. When two modules are shown on

top of each other, it indicates that the top module makes use ofthe services that are provided by the module immediately belowit, adding an additional level of abstraction.

1) Software Modules: The software modules are distributedin two workstations. Most modules in the Acquisition work-station (WS1), which is connected to the microscope, controlthe acquisition process. The modules running on the Analysisworkstation (WS2) deal with the analysis and storage of theimages and associated results.

At the lowest level of the WS1, we find the device drivers forthe microscope, the stage, and the camera. Immediately aboveit lies the Hardware Abstraction Layer (HAL), which providesa uniform interface with the hardware through a set of abstractcalls to simple actions of the microscope and the camera. Abovethe HAL, the Scanner is a collection of predefined commandsequences that perform complex operations, such as multidi-mensional acquisitions or area scans, along with operations thatrequire feedback from other components of the system, such asdeconvolution or database access.

Other modules provide common services, such as error prop-agation (in WS1), image deconvolution, image analysis, andimage and data storage (in WS2). The deconvolution serveris based on Huygens Scripting (Scientific Volume Imaging,Amsterdam, The Netherlands) and performs computational op-tical sectioning by removing background and strayed light from3-D stacks of widefield images. The software is controlled byJava wrapping code, which is interfaced to the network forremote operation.

The error and exception bus handles and distributes softwareexceptions. It is a common virtual dispatcher for any error eventsignal launched by any software module. When an error occurs,a notification is sent through the bus to all listening modules.

2) Programming Languages: The device drivers were writ-ten in C [7]. All intermediate layers that are located abovewere written in Java (Oracle, Redwood Shores, CA, USA) totake advantage of its platform independence, object orientation,

Page 4: A Novel Automated Microscopy Platform for Multiresolution Multispectral Early Detection of Lung Cancer Cells in Bronchoalveolar Lavage Samples

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE SYSTEMS JOURNAL

and the availability of many application libraries. All high-levelalgorithms were written in MATLAB (The Mathworks, Natick,MA, USA). MATLAB has direct access to Java classes. Sincethe HAL was written in Java, MATLAB can readily access themicroscope control layer. As an interpreted language, changesto the code can be dynamically done at runtime. Additionally,MATLAB has direct access to the image processing and anal-ysis routines in DipLib1 (TU Delft, Delft, The Netherlands)through DipImage. Finally, the ability to easily directly builda graphical user interface (GUI) from the environment is anadditional benefit.

D. Software Function: Image Analysis and Storage Workflow

Any sequence of operations implementing a high-level ap-plication is called a workflow. Workflows orderly request ac-cess to microscope operations, image analysis functions, andthe allocation of storage resources. To optimize the effec-tive access to these resources by one or several concurrentworkflows, we have implemented a pipeline approach using aproducer–consumer model. Each step is concurrently executed,and each acts as a consumer for the previous step and as aproducer for the next step. Whenever a consumer is out ofinputs, it waits for the next input to appear. This allows eachstep to be run in parallel in a different MATLAB instance. Thisway, the throughput of the workflow is reduced to the inverseof the slowest step, instead of the inverse of the total workflowtime [8]. The memory requirements for an additional MATLABinstance are not very high; apart from the base memory con-sumption of MATLAB itself, little memory is needed as eachobject is separately processed, and only one object/field at atime needs to be stored in memory.

We now describe the main functionalities of the softwaremodules: the storage manager, the relational database, and theimage acquisition and analysis.

1) Storage Manager: The Storage Manager handles thestorage of objects in a network-attached storage (NAS) de-vice and implements the producer–consumer model describedearlier. It acts as an active buffer between each step and thenext by providing primitives to store and read objects. It alsodeals with blocking actions, such as a read operation on anempty buffer. Each object in the buffer is stored as a file in theNAS. Unlike the producer–consumer models that use fixed-sizebuffers, the space is, for practical purposes, unlimited, and stor-ing an object is a nonblocking operation. If no space is left, thewrite primitive returns an error. Whenever the buffer is empty,a read operation puts the consumer on hold until a new elementis produced.

2) Relational Database: Our storage system consists of arelational database that organizes the results of the imageanalysis in a hierarchical structure. The database engine is theHyper Structured Query Language Database, which is an open-source lightweight embeddable database management systemthat is entirely written in Java. All the database operationscan be accessed from MATLAB and Java. The database wasinterfaced to MATLAB by mapping each row of a database

1http://www.diplib.org

Fig. 3. Image analysis workflow. After autofocusing, in the first phase,the sample is searched for possible candidates. After the classification step,the selected objects are revisited with a higher magnification and resolutionobjective. Note that the scanning (revisit), preprocessing, and analysis stepscan be executed in parallel.

table to a MATLAB structure. The structure fields contain datafrom the table itself and from all related tables thus simplifyingthe access and hiding the structure of the tables in the database.

3) Image Acquisition and Analysis: The image analysisworkflow runs in two phases: In the first phase, the systemscans the samples at low magnification (20×), finding candidatenuclei through the quantification of the immunomarker, and inthe second phase, the system revisits the candidate cells in 3-Dat high magnification (40×) to analyze their genetic integrityby counting the number of copies of the genes that are labeledusing the FISH. Both phases include three steps: preprocessing,segmentation, and classification (see Fig. 3). Before the firstphase, the sample is automatically focused using the Haltonsampling scheme that is described in [9].

a) Immunophenotyping: Low-magnification images arefirst preprocessed to remove the artifacts caused by unevenillumination and chromatic shift aberrations. Then, to segmentthe nuclei, we apply the Mexican Hat filter, which is a commontool for spot detection [10], followed by a fixed 0.1 thresholdthat is determined empirically. The Mexican Hat filter is con-structed as the Laplacian of a Gaussian. The filtered image has aminimum near the centroid of objects whose size is in the orderof the Gaussian standard deviation σ, which is set to be similarto the size of a single nucleus (i.e., 10 μm). The segmentedmask is labeled using the eight-neighbour connectivity rule.The area that is covered by each label mask is referred to asan object in the following discussion. Each object is assigneda position (the absolute coordinates), which will be used in thesecond phase to locate the object under the center of the high-magnification objective.

Each object is classified as a Candidate or an Autofluores-cent based on the intensity of the BLUE channel relative tothe AQUA and RED channels. The classifier (Classifier 1)

Page 5: A Novel Automated Microscopy Platform for Multiresolution Multispectral Early Detection of Lung Cancer Cells in Bronchoalveolar Lavage Samples

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

PENGO et al.: PLATFORM FOR DETECTION OF LUNG CANCER CELLS IN BAL SAMPLES 5

uses nonnegative matrix factorization (NMF) [11] to reducethe dimensionality of the pixels from three channels to twopseudochannels, i.e., one for the blue objects (C) and one for theautofluorescent objects (AF). The staining factor (SF), which isdefined as the ratio between the number of pixels that belong toeach channel, is computed for each object. If the number ofpixels that belong to the first channel is higher than the numberof pixels that belong to the second channel (i.e., SF > 1), theobject is classified as C. Otherwise, it is classified as AF. Theclassifier has been trained to prefer sensitivity (low number offalse negatives) to specificity (low number of false positives).False positives, i.e., AF objects classified as C cells, are laterdealt with by further classifying C objects into nuclei (N),clusters (Cl), or garbage (G) (NCG). This second classifier(Classifier 2 or NCG) is a three-class classification supportvector machine (SVM) type 1 [12], which is tuned to minimizea figure of cost (FOC) consisting of the sum of the threeerroneous classifications, i.e., N or Cl classified as G and Clclassified as N.

b) Cytogenetic analysis: All objects classified as N arerevisited at high magnification. To this end, the micro-scope is repositioned on each selected object using the high-magnification (40×) objective lens. The stage is automaticallypositioned at the center of the object after correcting the shiftand rotation caused by the change of objective. Then, theimages are acquired as image stacks of 30 slices in 5 chan-nels (AQUA, GREEN, GOLD, RED, and INFRARED). Eachstack is first deconvolved to correct the blurring effect of themicroscope (point spread function) by using 25 iterations of thequick maximum likelihood estimation (QMLE) algorithm [13]that is included in the Huygens Deconvolution suite. The de-convolution software is automatically called by the MATLABcode after the acquisition is finished.

After deconvolution, a chromatic shift correction is applied.Apochromat objectives, as the one that is used in this applica-tion, are corrected for the three main wavelengths of the light(red, green, and blue), meaning that, at those wavelengths, thechromatic shift is close to zero. However, at longer or shorterwavelengths, the displacement is small but observable. This,combined with channel bleed-through, causes the same probe toappear in different channels at different positions, complicatingthe correct classification of each signal. The shift correction isperformed using cross-correlation and translational registration,using the shift values that are precalculated from multiplestained fluorescent beads.

After these preprocessing steps, the FISH spots are seg-mented using the Otsu thresholding method [14] and the resultthat is labeled as one of the four possible probes. The assign-ment is immediate in the AQUA and GOLD channels becausethe crosstalk affecting those channels is negligible. Therefore,the SpectrumAqua and the SpectrumGold fluorophores can bedirectly segmented from the corresponding AQUA and GOLDchannels. In the GREEN and RED channels, the situation ismore complex. In the GREEN channel, there is crosstalk fromthe SpectrumAqua and SpectrumGold fluorophores, and thereis also a strong autofluorescent background. To remove thecrosstalk, the area corresponding to the SpectrumAqua andSpectrumGold probes is subtracted from the GREEN channel.

Fig. 4. Channel crosstalk in the GREEN and RED channels. The crosstalk iscompensated by segmenting all the channels and by performing subtraction.SpectrumGold and SpectrumAqua probes are subtracted from the GREENchannel, whereas SpectrumGold is subtracted from the RED channel.

Similarly, the SpectrumGold area is subtracted from the REDchannel. After subtraction, segmentation can be performed, andthe resulting masks entirely belong to the SpectrumGreen andSpectrumRed fluorochromes, respectively. Because of the weaksignal of the SpectrumRed and the strong autofluorescence inthe GREEN channel, a classifier is needed to separate probesfrom spurious objects (Classifiers 3a and 3b, respectively).Fig. 4 shows an example of the procedure. We use a naiveBayesian classifier (NBC) on a series of features selectedusing the J1-score that is described in [15]. The features forthe SpectrumRed classifier are the mean, the excess kurtosis,and the minimum intensity, whereas for the SpectrumGreenclassifier, we used the surface area, the perimeter-to-area ratio,and the minimum intensity.

Finally, the number of probes in each channel is used toclassify (Classifier 4) the object as either tumoral (+N) ornontumoral (−N) by also using an NBC.

c) Training of the classifiers: Classifiers 1 and 2 weretrained using both synthetic and real images. Synthetic im-ages were used to assess the performance of the algorithm,whereas manually classified images from real samples (seeSection II-A2) were used to train the classifier parameters forthe immunophenotyping.

Classifier 3 was trained on the 30 nuclei from normal humanlymphocytes thus having two copies of each gene.

Classifier 4 was trained to distinguish human lymphocytesfrom the A549 and H460 cancer cell lines based on the numberof copies of the four FISH probes.

Page 6: A Novel Automated Microscopy Platform for Multiresolution Multispectral Early Detection of Lung Cancer Cells in Bronchoalveolar Lavage Samples

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE SYSTEMS JOURNAL

Fig. 5. Illustration of the image analysis workflow. (a) Color composite of thethree acquired channels (scale bar 20 μm). (b) Result of the immunophenotyp-ing step: the N are shown in green, the Cl in red, and the AF in blue (scale bar20 μm). (c) Result of the cytogenetic step, where the FISH signals are detectedand analyzed (scale bar 5 μm). (d) Reconstructed nucleus with the signals fromthe four FISH probes.

E. Validation

Two samples were prepared (see Section II-A2) for thevalidation of the platform and the analysis algorithm, each con-taining a mixture of BAL from a subject that is not suspicious ofhaving lung cancer with the A549 and H649 cancer cell lines.The area of interest was 5 × 5 low-magnification fields-of-view,corresponding to a slide area of 0.55 mm2.

III. RESULTS

The performance of each classifier (Classifiers 1–4) is sep-arately shown, and then, the final validation experiment repre-sents the combined performance of the system. An example ofthe image analysis workflow is shown in Fig. 5.

A. NMF Classifier (Classifier 1)

The classifier was evaluated using the true positive rate(TPR) versus the false positive rate (FPR) in a standard receiveroperating characteristic (ROC) curve. The working point of theclassifier was chosen to yield the minimum expected FPR witha TPR above 95% (see Fig. 6). The classifier that is based onthe SF was compared with thresholding the BLUE channel only(DT) and thresholding the color ratio (TR) that is computedas BLUE/(BLUE + GREEN + RED). The DT obtains theworst performance, with an area under the ROC curve (AUR)between 0.43 and 0.89. The TR only marginally improvesthe classification, with an AUR between 0.56 and 0.98. Ourproposed algorithm (the SF) performs the best and achieves anAUR between 0.91 and 1.00.

B. NCG Classifier (Classifier 2)

The NCG classifier is parameterized using two weights C1

and C2 for two of the three free margins, with one for each class[12]. These two weights were tuned to give the highest accuracy

Fig. 6. Performance of the NMF-based classification. Both the ROC trainingcurves and the results on the validation set are shown for the (a) synthetic dataset, (b) the H460 cell line, and (c) the A549 cell line. SF indicates the datafor our algorithm. TR indicates the results for the color ratio. DT indicates theresults of thresholding the BLUE channel. In the three cases, the target TPR at apoint-of-work was set to 0.95 and can be seen in the graph as a dotted horizontalline. The markers (+, ∗, x) indicate the performance on the validation data,which is also shown in the table. As an additional performance measure, thetable also shows the AUR, which is a standard measure to compare classifiers.

and the lowest FOC (see Section II) by solving an optimizationproblem with objective as follows:

f(C1, C2) = Acc(C1, C2)− FOC(C1, C2)/2 (1)

where Acc(C1, C2) and FOC(C1, C2) indicate the cross-validated accuracy and FOC of the SVM, respectively, thatis trained with weights C1 and C2. Fig. 7 shows the cross-validated accuracy (the leave-one-out method [15]) for 100 ×100 values of the C1 and C2 parameters. In our case, the optimalcombination is C1 = C2 = 2, which yields an accuracy of 85%.

C. FISH Probe Classifiers (Classifier 3a and 3b)

The accuracy of the FISH probe classifier was evaluated on30 nuclei from human lymphocytes, which contain two copiesof each gene. When performed correctly, the spot-countingalgorithm should give two copies in each channel. In particular,15 of the 30 nuclei were used to train the NBC, and theremaining 15 nuclei were used to evaluate the performance ofthe spot counting (see Fig. 8).

As shown in the figure, the spot counting for the Spec-trumAqua and SpectrumGold fluorophores is correct in 93%(14/15) and 100% (15/15) of the nuclei, respectively. Spectrum-Red suffers from having weak signal intensity and from somecrosstalk from the SpectrumGold fluorophore but still is cor-rectly classified in 87% (13/15) of the nuclei. SpectrumGreensuffers from both crosstalk and autofluorescence, which resultsin over-detection and an accuracy of 73% (11/15). The overallaccuracy of the classification is 90% (54/60).

Page 7: A Novel Automated Microscopy Platform for Multiresolution Multispectral Early Detection of Lung Cancer Cells in Bronchoalveolar Lavage Samples

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

PENGO et al.: PLATFORM FOR DETECTION OF LUNG CANCER CELLS IN BAL SAMPLES 7

Fig. 7. (a) Accuracy and (b) figure of cost of the NCG classifier, which aremeasured using the leave-one-out method.

Fig. 8. Results of the spot-counting algorithm on the validation set of15 nuclei of human lymphocytes.

D. Validation Experiment

After separately testing each software module and eachclassifier, we evaluated the complete workflow by analyzingtwo samples containing BAL samples mixed with either theA549 or the H460 cell line. To validate the system, all objects

were manually labeled as AF, G, Cl, positive or negative nuclei,i.e., +N and −N, respectively, and dubious (marked witha question mark). This manual classification was comparedwith the automated classification (see the confusion matrixin Table III). Cl were not further processed; therefore, theautomatic classification does not distinguish if Cl are positive(P) or negative (N). Therefore, there is no information on themisclassification of positive or negative nuclei (+N or −N,respectively) as Cl.

Table IV summarizes the results by grouping the classes intoP or N. The P group contains +N, whereas the N group containsAF, G, and −N. Note that, as we have no information on thepositivity of Cl, these are not assigned to either group.

From the summary confusion matrix, we extract the follow-ing performance indicators: accuracy, sensitivity, specificity,the positive predictive value (PPV) and the negative predic-tive value (NPV). The performance indicators are shown inTable V.

The time that is required to perform the different steps of theanalysis is shown in Fig. 9. The diagram also presents howthe different steps have been overlapped in time to maximizethe throughput. The total time that is needed to perform theentire analysis has been reduced from an original 7 h and 56 minto just over 4 h. Note that the bottleneck is the 3-D acquisition(30 planes, 5 channels per object, 500 ms per plane) whoseoverhead is mainly due to the time that is required to mechan-ically change the filters. The analysis has been performed onthree workstations for a total of eight processing units.

IV. DISCUSSION

Since the 1970s, there has been great interest in applyingthe computational power of computers to the analysis of mi-croscopy images [3], [16]–[23]. Some automated image anal-ysis systems [20] that are developed in the late 1970s (e.g.,the Magiscan by Joyce–Loebl) were clinically used for chro-mosome analysis on images of metaphase cell spreads. Recentprogress in image analysis algorithms and faster controllers forthe microscope and cameras have improved these early systems,giving rise to sophisticated instruments such as Isis (MetaSys-tems, Althusheim, Germany), which is suited for the analysisof samples stained with the FISH, or Ariol (Genetix, San Jose,CA, USA), which is a generic 2-D automated microscopyplatform. High-throughput platforms have been also developedfor general-purpose cellular screens, such as the ArrayScan(Cellomics, Pittsburgh, PA, USA) or the InCell Analyzer (Gen-eral Electric Healthcare, Buckinghamshire, U.K.). Metamorph(Molecular Devices, Sunnyvale, CA, USA) is a software solu-tion that can be coupled to a variety of automated microscopecomponents and is able to perform relatively complex tasksthrough the use of journals (a structured scripting language).

In academic environments, a number of automated proto-types have been developed and reported in scientific jour-nals. Netten et al. [24] developed an automated microscopysystem for counting the FISH-stained centromeric probes inlymphocytes from cultured blood. Their results show goodconcordance with those provided by a manual operator, since in89% of the nuclei, the automatic FISH count was correct. The

Page 8: A Novel Automated Microscopy Platform for Multiresolution Multispectral Early Detection of Lung Cancer Cells in Bronchoalveolar Lavage Samples

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE SYSTEMS JOURNAL

TABLE IIIMANUAL VERSUS AUTOMATIC CLASSIFICATION PERFORMANCE

TABLE IVSUMMARY CONFUSION MATRIX

TABLE VPERFORMANCE INDICATORS

remaining 11% is attributed to overlapping, clustered, missedor false signals, out-of-focus dots, and debris.

Ortiz-de-Solórzano also developed a system for the au-tomatic quantification of immunofluorescence images [19],which was applied to the analysis of the FISH images [25].

Recently, an automated system has been proposed [26] toperform the imaging of live cells in two steps: A first scanwould analyze a set of images acquired in a single channel andsearch for interesting cells using machine learning algorithms,whereas a second scan would revisit those cells, acquiring them

in multiple channels. The images resulting from this secondphase are stored for further analysis.

Each system was designed to answer a specific question andis generally closed to extensions. No system developed in thepast decades fulfills all the requirements of our particular appli-cation. Therefore, we develop our own customized automatedplatform.

Missing from this paper is a quantitative comparison of ourCAD system with other comparable ones, but to our knowledge,this is the first time such an integrated approach has beenundertaken.

Our application consists of finding low-probability tumoralcells in BAL samples as a tool for the CAD of lung cancer. Toproperly train and evaluate the performance of our system, itwas tested by searching for the isolated cells of a known type(A549 and H460) within a BAL sample that is obtained froman individual without lung cancer. The acquisition and analysisprotocol was implemented using our automated platform, dis-tributing the computational effort on two machines. The samesoftware was used on both workstations, and the assignmentof the particular step to execute was performed by a singleselection.

The platform supports complex protocols, which, in thecurrent prototype, are easily developed using the commonMATLAB language. Image analysis routines are integrated andcan be directly used during the acquisition. The hardware layeris abstracted; thus, the code can be easily ported to and reused ina different hardware setup. Java and MATLAB are both cross-platform; therefore, although the platform has been developedon Linux machines, it can be readily rendered as platformindependent.

The platform was validated on an independent set of samples,showing an overall accuracy of 91.6%. The sensitivity is 89.8%,meaning that it was able to detect 44 of the 49 cancer cells.

Page 9: A Novel Automated Microscopy Platform for Multiresolution Multispectral Early Detection of Lung Cancer Cells in Bronchoalveolar Lavage Samples

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

PENGO et al.: PLATFORM FOR DETECTION OF LUNG CANCER CELLS IN BAL SAMPLES 9

Fig. 9. Timing and parallelization diagram for the different steps of the image acquisition and analysis protocol.

To further improve the platform, a series of changes havebeen envisioned. The current prototype uses MATLAB to con-trol high-level operations. Although it is very widespread, it isstill a proprietary language. Open alternatives such as Octave2

or NumPy3 would enable the platform to be more widespread.The main hurdle for this is the absence of an image analysislibrary that is equivalent to DipImage/DipLib. The platformhas been initially implemented in a combination of Java andMATLAB for rapid prototyping. As soon as the algorithms andthe platform become more stable, portions of the MATLABcode could be smoothly ported to Java and still be used fromMATLAB.

The hardware drivers were programmed from scratch, whichhinders the adoption of new hardware, as new drivers wouldneed to be programmed ad hoc. The integration of open alter-natives, such as μManager4 [27], would help leverage the widerange of available drivers.

Regarding the analysis protocol, the current classifier com-bination at the first pass is sensitive enough not to discard anypositive nucleus (49/49) but still suffers from a number of falsepositives (17/212). Furthermore, 5 of the 49 cancer cells did notpass the second test. The combination of the two classifiers ina single multiparameter classifier may render better separationbetween classes and improve the rate of false positives whilekeeping the same level of accuracy.

Our platform, which enables the use of multiple magni-fications or information at different scales, can be used inother applications, for instance, in the automated analysis oftissue microarrays. The entire microarray can be mapped atone magnification and then revisited at greater magnificationfor further analysis.

Other samples may be also analyzed with the system, such asfine needle aspirates or circulating cancer cells.

In summary, we believe that we have proposed a valuabletool for the analysis of biological samples via automated mi-croscopy. Our CAD system has been successfully applied tothe detection of tumoral cells from two lung-cancer cell lines insamples from BAL. Its potential and limitations will be betterunderstood once further trials are performed for the currentapplication and once new ones are designed and tested.

2http://www.gnu.org/software/octave3http://numpy.scipy.org4http://valelab.ucsf.edu/~MM/MMwiki/

REFERENCES

[1] A. Jemal, F. Bray, M. M. Center, J. Ferlay, E. Ward, and D. Forman,“Global cancer statistics,” CA Cancer J. Clin., vol. 61, no. 2, pp. 69–90,Feb. 2011.

[2] A. Jemal, M. J. Thun, L. A. Ries, H. L. Howe, H. K. Weir, M. M. Center,E. Ward, X. C. Wu, C. Eheman, R. Anderson, U. A. Ajani, B. Kohler,and B. K. Edwards, “Annual report to the nation on the status of cancer,1975–2005, featuring trends in lung cancer, tobacco use, and tobaccocontrol,” J Nat. Cancer Inst., vol. 100, no. 23, pp. 1672–1694, Dec. 2008.

[3] S. C. Springmeyer, R. Hackman, J. J. Carlson, and J. E. McClellan,“Bronchiolo-alveolar cell carcinoma diagnosed by bronchoalveolar lav-age,” Chest, vol. 83, no. 2, pp. 278–9, 1983.

[4] M. P. Rivera and A. C. Mehta, “Initial diagnosis of lung cancer: ACCPevidence-based clinical practice guidelines (2nd edition),” Chest, vol. 132,pp. 131S–148S, Sep. 2007.

[5] M. Campos, C. Prior, F. Warleta, I. Zudaire, J. Ruíz-Mora, R. Catena,A. Calvo, and J. J. Gaforio, “Phenotypic and genetic characterizationof circulating tumor cells by combining immunomagnetic selection andFICTION techniques,” J. Histochem. Cytochem., vol. 56, no. 7, pp. 667–675, Jul. 2008.

[6] M. J. Pajares, I. Zudaire, M. D. Lozano, J. Agorreta, G. Bastarrika,W. Torre, A. Remirez, R. Pio, J. J. Zulueta, and L. M. Montuenga,“Molecular profiling of computed tomography screen-detected lung nod-ules shows multiple malignant features,” Cancer Epidemiol, BiomarkersPrev., vol. 15, no. 2, pp. 373–380, Feb. 2006.

[7] D. M. Ritchie, “The Development of the C Language,” in Proc. 2nd ACMSIGPLAN Conf. HOPL-II, 1993, pp. 201–208.

[8] P. J. Denning and J. P. Buzen, “The operational analysis of queueingnetwork models,” ACM Comput. Surv., vol. 10, no. 3, pp. 225–261,Sep. 1978.

[9] T. Pengo, A. Munoz-Barrutia, and C. Ortiz-De-Solorzano, “Halton sam-pling for autofocus,” J. Microsc., vol. 235, no. 1, pp. 50–58, Jul. 2009.

[10] D. Sage, F. R. Neumann, F. Hediger, S. M. Gasser, and M. Unser, “Au-tomatic tracking of individual fluorescence particles: Application to thestudy of chromosome dynamics,” IEEE Trans. Image Process., vol. 14,no. 9, pp. 1372–1383, Sep. 2005.

[11] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negativematrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791, Oct. 1999.

[12] V. N. Vapnik, “An overview of statistical learning theory,” IEEE Trans.Neural Netw., vol. 10, no. 5, pp. 988–999, Sep. 1999.

[13] G. M. P. van Kempen, H. T. M. van der Voort, and L. J. van Vliet, “Aquantitative comparison of two restoration methods as applied to confocalmicroscopy,” J. Microscopy, vol. 185, no. 3, pp. 354–365, Mar. 1997.

[14] N. Otsu, “A threshold selection method from gray-level histograms,”IEEE Trans. Sys., Man, Cyber., vol. SMC-9, no. 1, pp. 62–66, Jan. 1979.

[15] K. Fukunaga, Introduction to Statistical Pattern Recognition. San Diego,CA, USA: Academic, 1990.

[16] D. K. Green, R. Bayley, and D. Rutovitz, “A cytogeneticist’s microscope,”Microsc. Acta, vol. 79, no. 3, pp. 237–245, May 2007.

[17] D. K. Green and P. W. Neurath, “The design, operation and evaluation of ahigh speed automatic metaphase finder,” J. Histochem. Cytochem, vol. 22,no. 7, pp. 531–535, Jul. 1974.

[18] M. Kozubek, P. Matula, P. Matula, and S. Kozubek, “Automated acquisi-tion and processing of multidimensional image data in confocal in vivomicroscopy,” Microsc. Res. Tech., vol. 64, no. 2, pp. 164–175, Jun. 2004.

[19] C. Ortiz-de-Solórzano, “Análisis y caracterizacián de imágenes de mi-croscopía de muy baja luminosidad: Automatización del análisis cito-genético,” Ph.D. dissertation, Universidad Politecnica de Madrid, Madrid,Spain, 1996.

Page 10: A Novel Automated Microscopy Platform for Multiresolution Multispectral Early Detection of Lung Cancer Cells in Bronchoalveolar Lavage Samples

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE SYSTEMS JOURNAL

[20] J. Philip and C. Lundsteen, “Semiautomated chromosome analysis, Aclinical test,” Clin. Genet., vol. 27, no. 2, pp. 140–146, Feb. 1985.

[21] A. Santos, C. Ortiz de Solorzano, J. J. Vaquero, J. M. Pena, N. Malpica,and F. Del Pozo, “Evaluation of autofocus functions in molecular cytoge-netic analysis,” J. Microsc., vol. 188, pp. 264–272, Dec. 1997.

[22] E. M. van Ingen, N. Verwoerd, and J. S. Ploem, “LEYTAS-2: A hybridsystem for the analysis of cytological preparations, using both hardwareand software methods,” Microsc. Acta Suppl., vol. Suppl 4, pp. 3–14,1980.

[23] J. Vrolijk, P. L. Pearson, and J. S. Ploem, “LEYTAS: A system for theprocessing of microscopic images,” Anal. Quant. Cytol., vol. 2, no. 1,pp. 41–48, Mar./Apr. 1980.

[24] H. Netten, I. T. Young, L. J. van Vliet, H. J. Tanke, H. Vroljik, andW. C. Sloos, “FISH and chips: Automation of fluorescent dot countingin interphase cell nuclei,” Cytometry, vol. 28, no. 1, pp. 1–10, May 1997.

[25] C. Ortiz-de-Solórzano, A. Santos, I. Vallcorba, J. M. Garcia-Sagredo, andF. del Pozo, “Automated FISH spot counting in interphase nuclei: Statis-tical validation and data correction,” Cytometry, vol. 31, no. 2, pp. 93–99,Feb. 1998.

[26] C. Conrad, A. Wünsche, T. H. Tan, J. Bulkescher, F. Sieckmann,F. Verissimo, A. Edelstein, T. Walter, U. Liebel, T. Pepperkok, andJ. Ellenberg, “Micropilot: Automation of fluorescence microscopy–basedimaging for systems biology,” Nat. Methods, vol. 8, no. 3, pp. 246–259,Mar. 2011.

[27] A. Edelstein, N. Amodaj, K. Hooverk, R. Vale, and N. Stuurman,“Computer control of microscopes using μManager,” Current Protoc.Mol. Biol., ch. 92, pp. 14.20.1–14.20.17, Oct. 2010.

Thomas Pengo was born in Padua, Italy. He receivedthe B.Sc and M.Sc. degrees in computer engineeringfrom Politecnico di Milano, Milan, Italy, in 2003 and2005, respectively and the Ph.D. degree from theUniversity of Navarra, Pamplona, Spain, in 2010.

During his doctoral years, he was a VisitingScientist with the Centre for Biomedical ImageAnalysis, Faculty of Informatics, Masaryk Uni-versity, Brno, Czech Republic, and with the SudarLaboratory, Life Sciences Division, LawrenceBerkeley National Laboratory, Berkeley, CA, USA.

After working on the analysis of super-resolution microscopy images fortwo years as a Postdoctoral Researcher with the Laboratory for ExperimentalBiophysics, École Polytechnique Fédérale de Lausanne, Lausanne,Switzerland, he is now with the Advanced Light Microscopy Unit, Center forGenomic Regulation (CRG), Barcelona, Spain, in the Luis Serrano Group.

Arrate Muñoz-Barrutía (S’99–M’03–SM’10) re-ceived the M.S. degree in telecommunication en-gineering from the Public University of Navarra,Pamplona, Spain, in 1997 and the Ph.D. degree fromthe Swiss Federal Institute of Technology Lausanne(EPFL), Lausanne, Switzerland, in 2002.

She is currently a Staff Scientist with the Centerfor Applied Medical Research (CIMA), Universityof Navarra, Pamplona and an Associate Professorwith the School of Engineering TECNUN, Univer-sity of Navarra, San Sebastián, Spain.

Dr. Muñoz-Barrutía is the author of more than 60 peer-reviewed internationaljournal and top conference publications in biomedical image processing. From2005 to 2010, she held a Ramón y Cajal Fellowship from the Spanish Ministryof Science and Technology.

Carlos Ortiz-de-Solórzano (M’98–SM’07) wasborn in Leon, Spain, in 1967. He received the B.Eng.and Ph.D. degrees in telecommunication engineeringfrom the Universidad Politécnica de Madrid, Madrid,Spain, in 1992 and 1996, respectively.

From 1997 to 2000, he was a Postdoctoral Fel-low with the Lawrence Berkeley National Labora-tory (LBNL), Berkeley, CA, USA. From 2000 to2004, he was a Staff Scientist and a Principal In-vestigator with LBNL’s Bioimaging Group. He iscurrently a Staff Scientist with the Oncology Di-

vision and the Head of the Morphology and Imaging Laboratory, Centerfor Applied Medical Research (CIMA), University of Navarra, Pamplona,Spain and an Associate Professor with the School of EngineeringTECNUN, University of Navarra, San Sebastián, Spain. He is the author of over60 papers in a wide range of peer-reviewed journals. His research interestsinclude quantitative fluorescence microscopy and image analysis, with appli-cations to studying the molecular mechanisms of normal tissue developmentand cancer.