rapid object detection using a boosted cascade of simple features

Rapid Object Rapid Object Detection Detection using a Boosted using a Boosted

Cascade of Simple FeaturesCascade of Simple Features

Original AuthorOriginal Author Paul Viola & Michael JonesPaul Viola & Michael Jones

In: Proc. Conf. Computer Vision and Pattern In: Proc. Conf. Computer Vision and Pattern Recognition. Volume 1.,Recognition. Volume 1.,

Kauai, HI, USA (2001) 511–518Kauai, HI, USA (2001) 511–518

Speaker: Speaker: Jing Ming Chiuan Jing Ming Chiuan ((井民全井民全 ))

(moving or acting with great speed)

(increase the strength or value of Sth)

OutlineOutline IntroductionIntroduction The Boost algorithm for classifier The Boost algorithm for classifier

learninglearning Feature SelectionFeature Selection Weak learner constructorWeak learner constructor The strong classifierThe strong classifier

A tremendously difficult problemA tremendously difficult problem ResultResult ConclusionConclusion

What had we done?What had we done? A machine learning approachA machine learning approach for visual for visual

object detectionobject detection Capable of processing images Capable of processing images extremely rapidlyextremely rapidly Achieving Achieving high detection rateshigh detection rates

Three key Three key contributionscontributions A new image representation A new image representation Integral Image Integral Image A learning algorithm( Based on AdaBoost[5])A learning algorithm( Based on AdaBoost[5])

A combining classifiers method A combining classifiers method cascade cascade classifiersclassifiers

Select a small # of visual features from a larger set yield an efficient classifiers

Speed up the feature evaluation

Discard the background regions of the image

Working only with a single grey scale image

A demonstration on face A demonstration on face detectiondetection

A frontal face detection systemA frontal face detection system The detector run at 15 frames per The detector run at 15 frames per

second without resorting to image second without resorting to image differencing or skin color detectiondifferencing or skin color detection

Image difference in video sequences

384 x 288 on a PentiumIII 700 MHz

The broad practical The broad practical applicationsapplications

for a for a extremely extremely fast face detectorfast face detector User Interface, Image Databases, User Interface, Image Databases,

TeleconferencingTeleconferencing The system can be implemented on a The system can be implemented on a

small low power devices. small low power devices. Compaq iPaq 2 frame/sec

Training process for Training process for classifierclassifier

The attentional operator is trained to dThe attentional operator is trained to detect examples of a particular class --- etect examples of a particular class --- a supervised training processa supervised training processIn the domain of face detection< 1% false negative<40% false postivie

Face classifier is constructed

Cascaded detection Cascaded detection processprocess

The sub-windows are processed by a The sub-windows are processed by a sequence of classifiers sequence of classifiers

each slightly more complex than the last

Any classifier rejects the sub-window, no further processing is performed

The process is essentially that of a The process is essentially that of a degenerate decision treedegenerate decision tree

Our object detection Our object detection frameworkframework

Original Image

Integral Image

In order to computingfeatures rapidly at many

scales

Haar Basis FunctionsHaar Basis FunctionsHaar Basis Functions

Feature Evaluation

Modified Ada Boost ProcedureFeature Selection

Large # of features

Small set of critical featuresCascaded Classifiers Structure

Feature SelectionFeature SelectionThe detection process is based on the feature rather than the

pixels directly.

Two Reasons:The ad-hoc domain knowledge is difficult

to learn using a finite quantify of training data.

The feature based system operates much faster

The simple features are usedThe simple features are usedThe Haar basis functions which have been used by Papag

eorgiou et al.[9]

Three kinds of featuresThree kinds of features

Feature SelectionFeature SelectionThe difference between the sum of pixels within two rectangular regions

Two-Rectangle Feature

The region have the same size and shapeAnd are horizontally or vertically adjacent

The base resolution is 24x24The exhaustive set of rectangle is large,

over 180,000.

Three-Rectangle Feature the sum within two

outside rectangle subtracted from the sum in a center rectangle

The difference between the diagonal pairs of rectangles

Four-Rectangle Feature

;0),1(,0)1,(

),,(),1(),(),,()1,(),(

yiixs

yxsyxiiyxiiyxiyxsyxs

Integral ImageIntegral ImageA intermediated representation

for rapidly computing the rectangle features

yyxx

yxiyxii'' ,

'' ),(),(

The integral imageThe original image

The recurrences pair for one pass computing The cumulative row sum 1 2 5

3 4 67 8 9

1 2 54 6 1111 14 20

s

i

+

+

1 3 84 10 2111 25 45

ii

+

3

1

49

Calculating any rectangle Calculating any rectangle sum with integral imagesum with integral image

1 A2 A + B3 A + C4 A + B + C + D

Rectangle Sum D = 4 - 3 - 2 + 1

AdaBoost learning algorithm Is used to do the feature selection task

Learning Classification Learning Classification FunctionsFunctions

Learning ProcessFeature Set

Training set1. Positive 2. Negative

A variant AdaBoost procedure

Facenon-Face

The final strong classifier

Over 180,000 rectangle features associate with each sub-image

24

24

Weak Learner 1

Weak Learner 2

Weak Learner 3

The final strong classifier

The Boost The Boost algorithm for algorithm for

classifier classifier learninglearning

),(, ... ),,(),,( 2211 nn yxyxyx

Image

Positive =1 Negative=0

Step 1: Giving example images

Step 2: Initialize the weights positives. and negatives of # theare and

,1,0for 21,

21

,1

lm

ylm

w ii

For t = 1, … , T 1. Normalize the weights,

2. For each feature j, train a classifier hj which is restricted to using a single feature

3. Update the weights:

ondistributiprobabity a is that so ,1 ,

,, tn

j jt

itit w

w

ww

.error lowest with the, ,classifier theChoose

|)(|

, respect to with evaluated iserror The

tt

iiijij

t

h

yxhw

w

otherwise

correctly classified is if,

,

,1,,1

it

ititetitit w

xwww i

Weak learner constructor

t

tt

1

Training set

Weak learner constructor 圖示解說

1w 2w nw

jf

jfjf

jfFeatures Over 180,000 features

for each subimage

1 2 3 000,180 i

iijij yxhw |)(|Errors

min

1h 2h 3h 000,180hth

.error lowest with the, ,classifier theChoose tth

Normalized the weights

1w 2w nwiwmiss correct correct miss t

titit ww

1,,1

Update the weights

Training the weak learner Training the weak learner 圖解圖解說明說明

X (Training set)

)(xf j

ex

Face examples Non-Face examples

If fj(x) > X is a face

i

iijij yxhw |)(|

1)( ij xh

False positive

False negative

feature a is

sign, inequality theofdirection theindicating

, thresholda is 0

)( if,1)(

j

j

j

jjjjj

f

P

whereotherwise

PxfPxh

AdaBoostingAdaBoosting Place the most weight on the Place the most weight on the

examples must often misclassified by examples must often misclassified by the preceding weak rulesthe preceding weak rules Forcing the base learner to focus its Forcing the base learner to focus its

attention on the “hardest” examplesattention on the “hardest” examples

The Boost algorithm for classifier The Boost algorithm for classifier learninglearning

),(, ... ),,(),,( 2211 nn yxyxyx

Step 1: Giving example images

Step 2: Initialize the weights

positives. and negatives of # theare and

,1,0for 21,

21

,1

lm

ylm

w ii

For t = 1, … , T 1. Normalize the weights, 2. For each feature j, train a classifier hj which is restricted to using a single feature 3. Update the weights:

Weak learner constructor

Final strong classifier 1th 2th 3th

Selected the weaker classifiers

t

tt

1

iiijij yxhw |)(|

The Big Picture on testing The Big Picture on testing processprocess

Ada Boosting Learner1h

Feature set

Feature Select & Classifier

Stage 1

False (Reject)

Ada Boosting LearnerStage 2

1h 2h 10h

Pass

False (Reject)


1h 2h more

Pass

False (Reject)

Reject as many negatives as possible (minimize the false negative)

100% Detection Rate50% False Positive

A tremendously difficult A tremendously difficult problemproblem

How to determineHow to determine The number of classifier stagesThe number of classifier stages The number of features in each stagesThe number of features in each stages The threshold of each stageThe threshold of each stage

Ada Boosting Learner1h

Training example

Feature Select & Classifier

Stage 1

False (Reject)

face

Non-face

100% Detection Rate50% False Positive


1h 2h 10h

Pass

False (Reject)

ResultResult A 38 layer cascaded classifier was trained A 38 layer cascaded classifier was trained to detect frontal upright facesto detect frontal upright faces

Training set: Training set: FaceFace: 4916 hand labeled faces with resolution 24x2: 4916 hand labeled faces with resolution 24x24.4. Non-face:Non-face: 9544 images contain no face. 9544 images contain no face. (350 million subwindows within these non-face images)(350 million subwindows within these non-face images)

FeaturesFeatures The first five layers of the detector: 1, 10, 25, 25 anThe first five layers of the detector: 1, 10, 25, 25 and 50 featuresd 50 features Total # of features in all layer Total # of features in all layer 6061 6061

ResultResult Each classifier in the cascade was Each classifier in the cascade was

trainedtrained Face : Face : 4916 + the vertical mirror image 4916 + the vertical mirror image

9832 images9832 images Non-face sub-windows: 10,000 Non-face sub-windows: 10,000 (size=24x24)(size=24x24)

Outline Outline

ResultResult Speed of the final DetectorSpeed of the final Detector Image ProcessingImage Processing Scanning the DetectorScanning the Detector Integration of Multiple DetectorIntegration of Multiple Detector Experiments on a Real-World Test Experiments on a Real-World Test

SetSet

Speed of the final DetectorSpeed of the final Detector

ResultResult The speed is directly related to The speed is directly related to the nuthe number of featuresmber of features evaluated per scanne evaluated per scanned sub-window.d sub-window. MIT+CMU test setMIT+CMU test set

An average of 10 featuresAn average of 10 features out of a total 606 out of a total 6061 are evaluated per sub-window.1 are evaluated per sub-window. On a 700Mhz PentiumIII, a 384 x 288 piOn a 700Mhz PentiumIII, a 384 x 288 pixel image in about xel image in about .067.067 seconds seconds (using a (using a staring scale of 1.25 and a step size of 1.5)staring scale of 1.25 and a step size of 1.5)

Image ProcessingImage Processing

ResultResult Minimize the effect of different lighting-Minimize the effect of different lighting-

conditionsconditions Variance normalized Variance normalized

reference: http://www.ic.sunysb.edu/Stu/sewang/papers/Fingerprint%20Classification%20by%20Directional%20Fields.pdf

http://www.ic.sunysb.edu/Stu/sewang/papers/Fingerprint%20Classification%20by%20Directional%20Fields.pdf

Scanning the DetectorScanning the Detector

ResultResult The final detector is scanned across the The final detector is scanned across the

image at multiple scale and locationsimage at multiple scale and locations

Good results are obtained using a set of Good results are obtained using a set of scales a factor of 1.25 apartscales a factor of 1.25 apart

Locations are obtained by Locations are obtained by shifting the shifting the window some pixels window some pixels If the current scale is s, the window is If the current scale is s, the window is

shifted by shifted by

Scale is achieved by scaling the detector itself rather than the image

][ sis the rounding operation[]

Integration of Multiple Integration of Multiple DetectorDetector

ResultResult Multiple detections will usually occur Multiple detections will usually occur

around each face and some types of around each face and some types of false positives. false positives.

A post-process to detected sub-A post-process to detected sub-windows in order to windows in order to combine combine overlapping detections into a single overlapping detections into a single detectiondetection Two detections are in the same subset if Two detections are in the same subset if

their bounding regions overlaptheir bounding regions overlap

Experiments on a Real-World Experiments on a Real-World Test SetTest Set

ResultResult

The MIT+CMU frontal face test set consistsof 130 images with 507 labeled frontal faces

Detection rates for various numbers of false positives on the MIT+ CMU test setcontaining 130 images and 507 faces.

Experiments on a Real-World Experiments on a Real-World Test SetTest Set

Result Result

Our detector

ROC curve for the face detector on MIT+CMU test set

The detector was run using a step size of 1.0 and starting scale of 1.0

False Positive

Cor

rect

det

ectio

n ra

te

75,081,800 sub-windowsscanned

A simple A simple voting schemevoting scheme to further to further improve resultsimprove results

ResultResult

Running three detectorsRunning three detectors The 38 layer one described above plus The 38 layer one described above plus

two similarly trained detectorstwo similarly trained detectors Output the majority vote of three Output the majority vote of three

detectorsdetectorsThe improvement would be greater if the detectors were more independent.

Output of our face detector from the MIT+CMU test set

ConclusionConclusion A object detection approach A object detection approach

minimizes computation time while minimizes computation time while achieving high detection rateachieving high detection rate

This paper brings together new This paper brings together new algorithms, representations and algorithms, representations and insights which are quite genericinsights which are quite generic

The detector is approximately 15 times faster than previous approach

ConclusionConclusion The database set includes faces The database set includes faces

under very wide range of conditions under very wide range of conditions including: illumination, scale, pose, including: illumination, scale, pose, and camera variationand camera variation

rapid object detection using a boosted cascade of simple features

Documents