rapid object detection using a boosted cascade of simple features
DESCRIPTION
(moving or acting with great speed). (increase the strength or value of Sth). Rapid Object Detection using a Boosted Cascade of Simple Features. Original Author Paul Viola & Michael Jones In: Proc. Conf. Computer Vision and Pattern Recognition. Volume 1., Kauai, HI, USA (2001) 511 – 518. - PowerPoint PPT PresentationTRANSCRIPT
Rapid Object Rapid Object Detection Detection using a Boosted using a Boosted
Cascade of Simple FeaturesCascade of Simple Features
Original AuthorOriginal Author Paul Viola & Michael JonesPaul Viola & Michael Jones
In: Proc. Conf. Computer Vision and Pattern In: Proc. Conf. Computer Vision and Pattern Recognition. Volume 1.,Recognition. Volume 1.,
Kauai, HI, USA (2001) 511–518Kauai, HI, USA (2001) 511–518
Speaker: Speaker: Jing Ming Chiuan Jing Ming Chiuan ((井民全井民全 ))
(moving or acting with great speed)
(increase the strength or value of Sth)
OutlineOutline IntroductionIntroduction The Boost algorithm for classifier The Boost algorithm for classifier
learninglearning Feature SelectionFeature Selection Weak learner constructorWeak learner constructor The strong classifierThe strong classifier
A tremendously difficult problemA tremendously difficult problem ResultResult ConclusionConclusion
What had we done?What had we done? A machine learning approachA machine learning approach for visual for visual
object detectionobject detection Capable of processing images Capable of processing images extremely rapidlyextremely rapidly Achieving Achieving high detection rateshigh detection rates
Three key Three key contributionscontributions A new image representation A new image representation Integral Image Integral Image A learning algorithm( Based on AdaBoost[5])A learning algorithm( Based on AdaBoost[5])
A combining classifiers method A combining classifiers method cascade cascade classifiersclassifiers
Select a small # of visual features from a larger set yield an efficient classifiers
Speed up the feature evaluation
Discard the background regions of the image
Working only with a single grey scale image
A demonstration on face A demonstration on face detectiondetection
A frontal face detection systemA frontal face detection system The detector run at 15 frames per The detector run at 15 frames per
second without resorting to image second without resorting to image differencing or skin color detectiondifferencing or skin color detection
Image difference in video sequences
384 x 288 on a PentiumIII 700 MHz
The broad practical The broad practical applicationsapplications
for a for a extremely extremely fast face detectorfast face detector User Interface, Image Databases, User Interface, Image Databases,
TeleconferencingTeleconferencing The system can be implemented on a The system can be implemented on a
small low power devices. small low power devices. Compaq iPaq 2 frame/sec
Training process for Training process for classifierclassifier
The attentional operator is trained to dThe attentional operator is trained to detect examples of a particular class --- etect examples of a particular class --- a supervised training processa supervised training processIn the domain of face detection< 1% false negative<40% false postivie
Face classifier is constructed
Cascaded detection Cascaded detection processprocess
The sub-windows are processed by a The sub-windows are processed by a sequence of classifiers sequence of classifiers
each slightly more complex than the last
Any classifier rejects the sub-window, no further processing is performed
The process is essentially that of a The process is essentially that of a degenerate decision treedegenerate decision tree
Our object detection Our object detection frameworkframework
Original Image
Integral Image
In order to computingfeatures rapidly at many
scales
Haar Basis FunctionsHaar Basis FunctionsHaar Basis Functions
Feature Evaluation
Modified Ada Boost ProcedureFeature Selection
Large # of features
Small set of critical featuresCascaded Classifiers Structure
Feature SelectionFeature SelectionThe detection process is based on the feature rather than the
pixels directly.
Two Reasons:The ad-hoc domain knowledge is difficult
to learn using a finite quantify of training data.
The feature based system operates much faster
The simple features are usedThe simple features are usedThe Haar basis functions which have been used by Papag
eorgiou et al.[9]
Three kinds of featuresThree kinds of features
Feature SelectionFeature SelectionThe difference between the sum of pixels within two rectangular regions
Two-Rectangle Feature
The region have the same size and shapeAnd are horizontally or vertically adjacent
The base resolution is 24x24The exhaustive set of rectangle is large,
over 180,000.
Three-Rectangle Feature the sum within two
outside rectangle subtracted from the sum in a center rectangle
The difference between the diagonal pairs of rectangles
Four-Rectangle Feature
;0),1(,0)1,(
),,(),1(),(),,()1,(),(
yiixs
yxsyxiiyxiiyxiyxsyxs
Integral ImageIntegral ImageA intermediated representation
for rapidly computing the rectangle features
yyxx
yxiyxii'' ,
'' ),(),(
The integral imageThe original image
The recurrences pair for one pass computing The cumulative row sum 1 2 5
3 4 67 8 9
1 2 54 6 1111 14 20
s
i
+
+
1 3 84 10 2111 25 45
ii
+
3
1
49
Calculating any rectangle Calculating any rectangle sum with integral imagesum with integral image
1 A2 A + B3 A + C4 A + B + C + D
Rectangle Sum D = 4 - 3 - 2 + 1
AdaBoost learning algorithm Is used to do the feature selection task
Learning Classification Learning Classification FunctionsFunctions
Learning ProcessFeature Set
Training set1. Positive 2. Negative
A variant AdaBoost procedure
Facenon-Face
The final strong classifier
Over 180,000 rectangle features associate with each sub-image
24
24
Weak Learner 1
Weak Learner 2
Weak Learner 3
The final strong classifier
The Boost The Boost algorithm for algorithm for
classifier classifier learninglearning
),(, ... ),,(),,( 2211 nn yxyxyx
Image
Positive =1 Negative=0
Step 1: Giving example images
Step 2: Initialize the weights positives. and negatives of # theare and
,1,0for 21,
21
,1
lm
ylm
w ii
For t = 1, … , T 1. Normalize the weights,
2. For each feature j, train a classifier hj which is restricted to using a single feature
3. Update the weights:
ondistributiprobabity a is that so ,1 ,
,, tn
j jt
itit w
w
ww
.error lowest with the, ,classifier theChoose
|)(|
, respect to with evaluated iserror The
tt
iiijij
t
h
yxhw
w
otherwise
correctly classified is if,
,
,1,,1
it
ititetitit w
xwww i
Weak learner constructor
t
tt
1
Training set
Weak learner constructor 圖示解說
1w 2w nw
jf
jfjf
jfFeatures Over 180,000 features
for each subimage
1 2 3 000,180 i
iijij yxhw |)(|Errors
min
1h 2h 3h 000,180hth
.error lowest with the, ,classifier theChoose tth
Normalized the weights
1w 2w nwiwmiss correct correct miss t
titit ww
1,,1
Update the weights
Training the weak learner Training the weak learner 圖解圖解說明說明
X (Training set)
)(xf j
ex
Face examples Non-Face examples
If fj(x) > X is a face
i
iijij yxhw |)(|
1)( ij xh
False positive
False negative
feature a is
sign, inequality theofdirection theindicating
, thresholda is 0
)( if,1)(
j
j
j
jjjjj
f
P
whereotherwise
PxfPxh
AdaBoostingAdaBoosting Place the most weight on the Place the most weight on the
examples must often misclassified by examples must often misclassified by the preceding weak rulesthe preceding weak rules Forcing the base learner to focus its Forcing the base learner to focus its
attention on the “hardest” examplesattention on the “hardest” examples
The Boost algorithm for classifier The Boost algorithm for classifier learninglearning
),(, ... ),,(),,( 2211 nn yxyxyx
Step 1: Giving example images
Step 2: Initialize the weights
positives. and negatives of # theare and
,1,0for 21,
21
,1
lm
ylm
w ii
For t = 1, … , T 1. Normalize the weights, 2. For each feature j, train a classifier hj which is restricted to using a single feature 3. Update the weights:
Weak learner constructor
Final strong classifier 1th 2th 3th
Selected the weaker classifiers
t
tt
1
iiijij yxhw |)(|
The Big Picture on testing The Big Picture on testing processprocess
Ada Boosting Learner1h
Feature set
Feature Select & Classifier
Stage 1
False (Reject)
Ada Boosting LearnerStage 2
1h 2h 10h
Pass
False (Reject)
Ada Boosting LearnerStage 3
1h 2h more
Pass
False (Reject)
Reject as many negatives as possible (minimize the false negative)
100% Detection Rate50% False Positive
A tremendously difficult A tremendously difficult problemproblem
How to determineHow to determine The number of classifier stagesThe number of classifier stages The number of features in each stagesThe number of features in each stages The threshold of each stageThe threshold of each stage
Ada Boosting Learner1h
Training example
Feature Select & Classifier
Stage 1
False (Reject)
face
Non-face
100% Detection Rate50% False Positive
Ada Boosting LearnerStage 2
1h 2h 10h
Pass
False (Reject)
ResultResult A 38 layer cascaded classifier was trained A 38 layer cascaded classifier was trained to detect frontal upright facesto detect frontal upright faces
Training set: Training set: FaceFace: 4916 hand labeled faces with resolution 24x2: 4916 hand labeled faces with resolution 24x24.4. Non-face:Non-face: 9544 images contain no face. 9544 images contain no face. (350 million subwindows within these non-face images)(350 million subwindows within these non-face images)
FeaturesFeatures The first five layers of the detector: 1, 10, 25, 25 anThe first five layers of the detector: 1, 10, 25, 25 and 50 featuresd 50 features Total # of features in all layer Total # of features in all layer 6061 6061
ResultResult Each classifier in the cascade was Each classifier in the cascade was
trainedtrained Face : Face : 4916 + the vertical mirror image 4916 + the vertical mirror image
9832 images9832 images Non-face sub-windows: 10,000 Non-face sub-windows: 10,000 (size=24x24)(size=24x24)
Outline Outline
ResultResult Speed of the final DetectorSpeed of the final Detector Image ProcessingImage Processing Scanning the DetectorScanning the Detector Integration of Multiple DetectorIntegration of Multiple Detector Experiments on a Real-World Test Experiments on a Real-World Test
SetSet
Speed of the final DetectorSpeed of the final Detector
ResultResult The speed is directly related to The speed is directly related to the nuthe number of featuresmber of features evaluated per scanne evaluated per scanned sub-window.d sub-window. MIT+CMU test setMIT+CMU test set
An average of 10 featuresAn average of 10 features out of a total 606 out of a total 6061 are evaluated per sub-window.1 are evaluated per sub-window. On a 700Mhz PentiumIII, a 384 x 288 piOn a 700Mhz PentiumIII, a 384 x 288 pixel image in about xel image in about .067.067 seconds seconds (using a (using a staring scale of 1.25 and a step size of 1.5)staring scale of 1.25 and a step size of 1.5)
Image ProcessingImage Processing
ResultResult Minimize the effect of different lighting-Minimize the effect of different lighting-
conditionsconditions Variance normalized Variance normalized
reference: http://www.ic.sunysb.edu/Stu/sewang/papers/Fingerprint%20Classification%20by%20Directional%20Fields.pdf
Scanning the DetectorScanning the Detector
ResultResult The final detector is scanned across the The final detector is scanned across the
image at multiple scale and locationsimage at multiple scale and locations
Good results are obtained using a set of Good results are obtained using a set of scales a factor of 1.25 apartscales a factor of 1.25 apart
Locations are obtained by Locations are obtained by shifting the shifting the window some pixels window some pixels If the current scale is s, the window is If the current scale is s, the window is
shifted by shifted by
Scale is achieved by scaling the detector itself rather than the image
][ sis the rounding operation[]
Integration of Multiple Integration of Multiple DetectorDetector
ResultResult Multiple detections will usually occur Multiple detections will usually occur
around each face and some types of around each face and some types of false positives. false positives.
A post-process to detected sub-A post-process to detected sub-windows in order to windows in order to combine combine overlapping detections into a single overlapping detections into a single detectiondetection Two detections are in the same subset if Two detections are in the same subset if
their bounding regions overlaptheir bounding regions overlap
Experiments on a Real-World Experiments on a Real-World Test SetTest Set
ResultResult
The MIT+CMU frontal face test set consistsof 130 images with 507 labeled frontal faces
Detection rates for various numbers of false positives on the MIT+ CMU test setcontaining 130 images and 507 faces.
Experiments on a Real-World Experiments on a Real-World Test SetTest Set
Result Result
Our detector
ROC curve for the face detector on MIT+CMU test set
The detector was run using a step size of 1.0 and starting scale of 1.0
False Positive
Cor
rect
det
ectio
n ra
te
75,081,800 sub-windowsscanned
A simple A simple voting schemevoting scheme to further to further improve resultsimprove results
ResultResult
Running three detectorsRunning three detectors The 38 layer one described above plus The 38 layer one described above plus
two similarly trained detectorstwo similarly trained detectors Output the majority vote of three Output the majority vote of three
detectorsdetectorsThe improvement would be greater if the detectors were more independent.
ConclusionConclusion A object detection approach A object detection approach
minimizes computation time while minimizes computation time while achieving high detection rateachieving high detection rate
This paper brings together new This paper brings together new algorithms, representations and algorithms, representations and insights which are quite genericinsights which are quite generic
The detector is approximately 15 times faster than previous approach