face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต...

Face Recognition &

Deep [email protected]

mailto:[email protected]

Standard procedure• Image capturing: camera, webcam, surveillance

• Face detection: locate faces in the image

• Face alignment: normalize size, rectify rotation

• Face matching

• 1:1 Face verification

• 1:N Face recognition

Viola-Jones Haar-like detector (OpenCV haarcascade_frontalface_alt2.xml)

face size~35x35 to 80x80 pixels

too small

occlusion

rotation

Recognition = compare these faces to known faces

Controlled environment face size 218x218 pixels

Viola-Jones eye detector

Eyes distance = 81 pixels Eyes angle = -0.7 degrees

Face size = 180x200 pixels Eyes distance = 100 pixels

Eyes angle = 0 degrees

Comparing face• Face image

• Bitmap of size 180x200 pixels

• Grayscale (0-255)

• 36,000 values/face image

• Given 2 face images x1 and x2

• x1(x,y) - x2(x,y)

• | x1(x,y) - x2(x,y) |

• (x1(x,y) - x2(x,y))2

• What should be used?

Basic Maths• 1 Face image = 1 vector

• 36,000 dimensions (d)

• matrix with 1 column

• Distance

• Euclidean distance

• Norm-p distance

• Norm-1 distance

• Norm-infinity distance

Pixels importance and projection

• Not all pixels have the same importance

• Pixel with low variation -> not important

• Pixel with large variation -> could be important

Projection When ||w||=1, wTx is the projection of x on axis w

w

Subspace projection

• What should be the axis w?

• How many axis do we need?

Principal Component Analysis PCA (1)

• Basic idea

• Measure of information = variance

• Variance of z1,…,zN for real numbers zt

• Given a set of face vectors x1,…,xN and axis wVariance of wTx1,…,wTxN is

Covariance matrix

Principal Component Analysis PCA (2)

• Best axis w is obtained by maximizing wTCw

with constraint ||w||=1

• w is an eigenvector of C : Cw = a w

• Variance wTCw=a is the corresponding eigenvalue of w

• PCA

• Construct Covariance matrix C

• Eigen-decompose C

• Select m largest eigenvectors

Eigenface (1)• What is the problem with face data?

• Solution

Dot matrix

dxd matrix NxN matrix

Eigenface (2)• We work with vectors of projected values

x1 x2 …

x40

x Enrollment

Template

Eigenface (3)

• Vector of raw intensity: 36,000 dimensions

• Vector of Eigenface coefficients: 10 dimensions

• Large Eigenface = large variation

• Small Eigenface = noise

Related techniques• Fisherface (LDA)

• Nullspace LDA

• Laplacianface

• Locality Sensitive Discriminant Analysis

• 2DPCA

• 2DLDA

• 2DPCA+2DLDA

Result on ORL (~10 years ago)

Techniques Accuracy #dimEigenface 90-95 200

Fisherface 91-97 50NLDA 92-97 40

Laplacianface 89-95 50LSDA 91-97 50

2DPCA 91.52DLDA 90.5

2DPCA+2DLDA 93.5

Limitations

• Occlusion: glasses, beard

• Lighting condition

• Facial expression

• Pose

• Make-up

Evaluation• Accuracy: find closest template and check the ID

• Verification (access control)

• Live captured image VS. stored image

• We have distance -> Should we accept or not?

• False Accept (FA) VS. False Reject (FR)

• From a set of face images

• Compute distances between all pair

• Select threshold T that gives 0 FA and X FR

• Number of tries

distance

T

Labeled Faces in the Wild

• Large number of subjects (>5,000)

• Unconstrained conditions

• Human performance 97-99%

• Traditional methods fail

• New alignment technique: funneling

LFW results

Use outside data to train the model

Deep Learning

Neural Network timeline

McCulloch & Pitts Neuron model (1943)

Perceptron limitation (1969)

Backprop algorithm 70-80’s

SVM (1992)

Deep Learning (2006)

• Return of Neural Network

• Focus on Deep Structure

• Take advantage of today computing power

Neural Networks (1)• Neurons are connected via synapse

• A neuron receives signals from other neurons

• When the activation reaches a threshold, it fires a signal to other neurons

http://en.wikipedia.org/wiki/Neuron

Neural Networks (2)• Universal Approximator

• Classical structure: MLP

• #hidden nodes, learning rate

• Backprop algorithm

• Gradient

• Direction of change that increases value of objective function

• Vector of partial derivatives wrt. each parameters

• Work on all structures, all objective functions

• Stoping criteria, local optima, gradient vanishing/exploding

Deep Learning• 2006 Hinton et al.: layer by layer construction -> pre-training

• Stack of RBMs, Stack of Autoencoders

• Convolutional NN (CNN)

• Shared weights

• Take advantage of GPU

CNN today• Common components

• Convolution layer, Max-pooling layer

• ReLU

• Drop-out, Sampling+flip training data

• GPU

• Tools: Caffe, TensorFlow, Theano, Torch

• Structure: LeNet, AlexNet, GoogLeNet

LeNet

AlexNet

LeNet

AlexNet

GoogLeNet

LeNet

AlexNet

GoogLeNet

Microsoft deep residual network: 150 layers!

DeepID(Sun et al. CVPR 2014)

• 160 dim, 60 regions, flipped

• 19,200 dimensions!! • Input to other model • CelebFace • Refine training

Learning technique

for deep structure

Big dataComputing

power GPU, etc.

face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต...

Education