hierarchical neural networks for object recognition and scene “understanding”

99
Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Upload: alaina-blair

Post on 02-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Page 2: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

2

Object Recognition

Task: Given an image containing foreground objects, predict one of a set of known categories.

“Airplane” “Motorcycle” “Fox”

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 3: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

3

Selectivity: ability to distinguish categories

“Bird”

“No-Bird”

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 4: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

4

Invariance: tolerance to variation

In-category Variation Rotation

Scaling Translation

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 5: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

• What makes object recognition a hard task for computers?

Page 6: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

From: http://play.psych.mun.ca/~jd/4051/The%20Primary%20Visual%20Cortex.ppt

Page 7: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

From: http://psychology.uwo.ca/fmri4newbies/Images/visualareas.jpg

Page 8: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Hubel and Wiesel’s discoveries

• Based on single-cell recordings in cats and monkeys

• Found that in V1, most neurons are one of the following types:

– Simple: Respond to edges at particular locations and orientations within the visual field

– Complex: Also respond to edges, but are more tolerant of location and orientation variation than simple cells

– Hypercomplex or end-stopped: Are selective for a certain length of contour

Adapted from: http://play.psych.mun.ca/~jd/4051/The%20Primary%20Visual%20Cortex.ppt

Page 9: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”
Page 10: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Neocognitron

• Hierarchical neural network proposed by K. Fukushima in 1980s.

• Inspired by Hubel and Wiesel’s studies of visual cortex.

Page 11: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

HMAX Riesenhuber, M. & Poggio, T. (1999),

“Hierarchical Models of Object Recognition in Cortex”

Serre, T., Wolf, L., Bileschi, S., Risenhuber, M., and Poggio, T. (2006),“Robust Object Recognition with Cortex-Like Mechanisms”

• HMAX: A hierarchical neural-network model of object recognition.

• Meant to model human vision at level of “immediate recognition” capabilities of ventral visual pathway, independent of attention or other top-down processes.

• Inspired by earlier “Neocognitron” model of Fukushima (1980)

Page 12: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

General ideas behind model

• “Immediate” visual processing is feedforward and hierachical: low levels detect simple features, which are combined hierarchically into increasingly complex features to be detected

• Layers of hierarchy alternate between “sensitivity” (to detecting features) and “invariance” (to position, scale, orientation)

• Size of receptive fields increases along the hierarchy

• Degree of invariance increases along the hierarchy

Page 13: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

13

HMAX

State-of-the-art performance on common benchmarks.

1500+ references to HMAX since 1999.

Many extensions and applications• Biometrics• Remote sensing• Modeling visual processing in

biology

Proc

essi

ng

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 14: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

14

Template Matching: selectivity for visual patterns

Pooling: invariance to transformation by combining multiple inputs

TemplateMatching

TemplateMatching

Pooling

Pooling

ON

OFF

Input

Template

Template

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 15: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

15

S1 Layer: edge detection

Input

S1

Edge Detectors

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 16: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

16

C1 Layer: local pooling

Some tolerance to position and scale.

Input

S1

C1

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 17: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

17

S2 Layer: prototype matching

Match learned dictionary of shape prototypes

Input

S1

C1

S2

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 18: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

18

C2 Layer: global pooling

Activity is max response to S2 prototype at any location or scale.

Input

S1

C1

S2

C2

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 19: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

19

Properties of C2 Activity

1. Activity reflects degree of match

2. Location and size do not matter

3. Only best match counts

Prototype

Activ

ation

InputInput Input

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 20: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

20

Properties of C2 Activity

1. Activity reflects degree of match

2. Location and size do not matter

3. Only best match counts

Prototype

Activ

ation

InputInput

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 21: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

21

Properties of C2 Activity

1. Activity reflects degree of match

2. Location and size do not matter

3. Only best match counts

Activ

ation

Prototype

InputInput Input

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 22: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

22

Classifier: predict object category

Output of C2 layer forms a feature vector to be classified.

Some possible classifiers:

SVM

Boosted Decision Stumps

Logistic Regression Input

S1

C1

S2

C2

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 23: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Gabor Filters

Gabor filter: Essentially a localizedFourier transform, focusing on a particularspatial frequency, orientation, and scale in the image.

Filter has associated frequency , scale s, and orientation .

Response measures extent to which is present at orientation at scale s centered about pixel (x,y).

Page 24: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Theta = 0Lambda = 5

Theta = 0Lambda = 10

Theta = 0Lambda = 15

Theta = 0Lambda = 10

Theta = 45Lambda = 10

Theta = 90Lambda = 10

http://matlabserver.cs.rug.nl/cgi-bin/matweb.exe

Page 25: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Examples of Gabor filters of different orientations and scales

From http://www.cs.cmu.edu/~efros/courses/LBMV07/presentations/0130SerreC2Features.ppt

Page 26: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

HMAX: How to set parameters for Gabor filters

• Sample parameter space

• Apply corresponding filters to stimuli commonly used to probe cortical cells (i.e., gratings, bars, edges)

• Select parameter values that capture the tuning properties of V1 simple cells, as reported in the neuroscience literature

Page 27: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Learning V1 Simple Cells via Sparse Coding

Olshausen & Field, 1996• Hypotheses:

– Any natural image I(x,y) can be represented by a linear superposition of (not necessarily orthogonal) basis functions, ϕ(x,y):

– The ϕi span the image space (i.e., any image can be reconstructed with appropriate choice of ai )

– The ai are as statistically independent as possible

– Any natural image has a representation that is sparse (i.e., can be represented by a small number of non-zero ai )

Page 28: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

• Test of hypothesis: Use these criteria to learn a set of ϕi from a database of natural images.

• Use gradient descent, with the following cost function:

cost of incorrect reconstruction cost of non-sparseness (using too many ai ), whereS is a nonlinear function and σi is a scaling constant

Page 29: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

• Training set: a set of 12x12 image patches from natural images.

• Start with large random set (144) of ϕi

• For each patch,

– Find set of ai to minimize E with respect to ai

– With these ai, Update ϕi using this learning rule:

where

Page 30: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

From http://redwood.berkeley.edu/bruno/papers/current-opinion.pdf

These resemble receptive fields of V1 simple cells: they are (1) localized, (2) orientation-specific, (3) frequency and scale-specific

Page 31: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

S1 units: Gabor filters (one per pixel)

16 scales / frequencies, 4 orientations (0, 45, 90, 135). Units form a pyramid of scales, from 7x7 to 37x37 pixels in steps of two pixels.Response of an S1 unit is absolute value of filter response.

Page 32: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

C1 unit: Maximum value of group of S1 units, pooled over slightly different positions and scales

8 scales / frequencies, 4 orientations

Page 33: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

From S. Bileschi, Ph.D. Thesis

The S1 and C1 model parameters are meant to match empirical neuroscience data.

Page 34: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

S2 layer

• Recall that each S1 unit responds to an oriented edge at a particular scale

• Each S2 unit responds to a particular group of oriented edges at various scales, i.e., a shape

• S1 units were chosen to cover a “full” range of scales and orientations

• How can we choose S2 units to cover a “full” range of shapes?

Page 35: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

HMAX’s answer: Choose S2 shapes by randomly sampling patches from “training images”

Page 36: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

HMAX’s answer: Choose S2 shapes by randomly sampling patches from “training images”

Extract C1 features in each selected patch. This gives a p×p×4 array, for 4 orientations.

Page 37: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

S2 prototype Pi , with 4 orientations

Page 38: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

S2 prototype Pi , with 4 orientations

Scale 2 Scale 3

Scale 5 Scale 6 Scale 7 Scale 8

Scale 1 Scale 4

Page 39: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

S2 prototype Pi , with 4 orientations

Input image to classify: C1 layer: 4 orientations, 8 scales

Calculate similarity between Pi and patches in input image, independently at each position and each scale.

Scale 2 Scale 3

Scale 5 Scale 6 Scale 7 Scale 8

Scale 1 Scale 4

Page 40: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

S2 prototype Pi , with 4 orientations

Input image to classify: C1 layer: 4 orientations, 8 scalesScale 1 Scale 4Scale 2 Scale 3

Scale 5 Scale 6 Scale 7 Scale 8

Calculate similarity between Pi and patches in input image, independently at each position and each scale.

Similarity (radial basis function):

Page 41: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

S2 prototype Pi , with 4 orientations

Input image to classify: C1 layer: 4 orientations, 8 scalesScale 1 Scale 4Scale 2 Scale 3

Scale 5 Scale 6 Scale 7 Scale 8

Calculate similarity between Pi and patches in input image, independently at each position and each scale.

Similarity (radial basis function):

Result: At each position in C1 layer of input image, we have an array of 4x8 values. Each value represents the “degree” to which shape Pi is present at the given position at the given orientation and scale.

Page 42: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

S2 prototype Pi , with 4 orientations

Input image to classify: C1 layer: 4 orientations, 8 scalesScale 1 Scale 4Scale 2 Scale 3

Scale 5 Scale 6 Scale 7 Scale 8

Calculate similarity between Pi and patches in input image, independently at each position and each scale.

Similarity (radial basis function):

Result: At each position in C1 layer of input image, we have an array of 4x8 values. Each value represents the “degree” to which shape Pi is present at the given position at the given scale. Now, repeat this process for each Pi, to get N such arrays.

Page 43: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

C2 unit: For each Pi, calculate maximum value over all positions, orientations, and scales. Result is N values, corresponding to the N prototypes.

Page 44: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Support Vector Machine

Feature vector representing image

classification

SVM classification: To classify input image (e.g., “face” or “not face”), give C2 values to a trained support vector machine (SVM).

Page 45: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Adaboost Classifier

Feature vector representing image

classification

Boosting classification: To classify input image (e.g., “face” or “not face”), give C2 values to a trained classifier trained by Adaboost.

Page 46: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Visual tasks:

(1) Part-based object detection: Detect different types of “part-based” objects, either alone or in “clutter”.

Data sets contain images that either contain or do not contain a single instance of the target object. Task is to decide whether the target object is present or absent.

(2) Texture-based object recognition: Recognize different types of “texture-based” objects (e.g., grass, trees, buildings, roads). Task is to classify each pixel with an object label.

(3) Whole-Scene Recognition: Recognize all objects (“part-based” and “texture-based”) in a scene.

Page 47: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Databases

• Caltech 5 (Five object categories: leaves, cars, faces, airplanes, motocycles)

• Caltech 101 (101 object categories)

• MIT Streetscenes

• MIT car and face databases

Page 48: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”
Page 49: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Sample images from the MITStreetscenes database

Page 50: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Sample images from the Caltech 101 dataset

Page 51: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

From Serre et al., Object recognition with features inspired by visual cortex

Sample of results for part-based objects

Results

Accuracy at the equilibrium point (i.e., at the point such that the false positive rate equals the false negative rate).

Page 52: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

How to do multiclass classification with SVMs?

Page 53: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

How to do multiclass classification with SVMs?

• Two main methods:

– One versus All

– One versus One (“all pairs”)

Page 54: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

One Versus All

• For N categories:

– Train N SVMs:

“Category 1” versus “Not Category 1”

“Category 2” versus “Not Category 2”

etc.

Run new example through all of them. Prediction is the category with the highest decision score.

Page 55: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

One Versus One (All pairs)

• Train N * (N – 1)/2 SVMs:

“Category 1” versus “Category 2”

“Category 1” versus “Category3”

...

“Category 2” versus “Category 3”

etc.

• To predict category for a new instance, run SVMs in a “decision tree”:

Category 1 vs Category 2

Category 1 vs Category 3

Category 2 vs Category 3

Category 1 Category 2

Page 56: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Whole Scene InterpretationStreetscenes project (Bileschi, 2006)

Page 57: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Result: real-valued detection strength for each possible category at each possiblelocation for each possible scale.

Object is considered “present” at a position and scale if its detection strength is above a threshold(thresholds are determined empirically).

“Local neighborhood suppression” is used to remove redundant detections.

A dense sampling of square windows of all possible scales and at all possible positions is cropped from test image, converted to C1 and C2 features, and passed through each possible SVM (or Boosting) classifier (one per category).

Experiment 1: Use C1 features

Experiment 2: Use C2 features

Page 58: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Sample results (here, detecting “car”) (Bileschi, 2006)

Page 59: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Standard Model (C1) = used HoG = Histogram of gradients (Triggs)Standard Model (C2) = used 444 S2 prototypes of 4 different sizesPart-Based = part-based model of Liebe et alGrayscale = Use raw grayscale values (normalized in size and histogram equalized)Local patch correlation = similar to system of Torralba

Results on crop data: Cars

Page 60: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”
Page 61: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Improvements to HMAXMutch and Lowe, 2006

• “Sparsify” features at different layers

• Localize C2 features

• Do feature selection for SVMs

Page 62: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

“Sparsify” S2 Inputs: Use only dominant C1 orientation at each position

Page 63: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Localize C2 features

– Assumption: “The system is “attending” close to the center of the object. This is appropriate for datasets such as the Caltech 101, in which most objects of interest are central and dominant.”

– “For more general detection of objects within complex scenes, we augment it with a search for peak responses over object location using a sliding window.”

Page 64: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Select C2 features that are highly weighted by the SVM

– All-pairs SVM consists of m(m-1)/2 binary SVMs.

– Each represents a separating hyperplane in d dimensions.

– The d components of the (unit length) normal vector to this hyperplane can be interpreted as feature weights; the higher the kth component (in absolute value), the more important feature k is in separating the two classes.

– Drop features with low weight, with weight averaged over all binary SVMs.

– Multi-round tournament: in each round, the SVM is trained, then at most half the features are dropped.

Page 65: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Number of SVM features – optimized over all categories

Page 66: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

ResultsMutch and Lowe, 2006

Page 67: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

ResultsMutch and Lowe, 2006

• Comparative results on Caltech 101 dataset

Page 68: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

From Mutch and Lowe, 2006

Page 69: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Future improvements?

• Could try to improve SVM by using more complex kernel functions (people have done this).

• “We lean towards future enhancements that are biologically realistic. We would like to be able to transform images into a feature space in which a simple classifier is good enough.”

• “Even our existing classifier is not entirely plausible, as an all-pairs model does not scale well as the number of classes increases.”

Page 70: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

How to learn good “prototype” (S2) features?

Page 71: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

71

Good Features for Object Recognition

Need the right number of discriminative features.

Too few features: classifier cannot distinguish categories.

Too many features: classifier overfits the training data.

Irrelevant features: increases error and compute time.

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 72: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

72

Prototype Learning

Given example images, learn a small set of prototypes that maximizes performance of the classifier.

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 73: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

73

This is very common.Zhu & Zhang (2008)

Faez et al. (2008)Gu et al. (2009)

Huang et al. (2008)Wu et al. (2007)Lian & Li (2008)

Moreno et al. (2007)Wijnhoven & With (2009)

Serre et al. (2007)Mutch & Lowe (2008)

Learning Prototypes by Imprinting

Record, or imprint, arbitrary patches of training images.

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 74: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

74

Shape Hypothesis (Serre et al., 2007): invariant representations with imprinted shape prototypes are key to the model’s success.

This is assumed in most of the literature, but has yet to be tested!

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 75: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

75

Question: Can hierarchical visual models can be improved by learning prototypes.

1. Is the shape hypothesis correct?

Compare imprinted and “shape-free” prototypes.

2. Do more sophisticated learning methods do better than imprinting?

Conduct a study of different prototype learning methods.

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 76: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

76

Glimpse

Hierarchical visual model implementation• Fast, parallel, open-source• Object recognition performance is similar to existing models.

Reusable framework can express wide range of models.

https://pythonhosted.org/glimpse/

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 77: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

77

Are invariant representations with imprinted shape prototypes key to the model’s success?

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 78: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

78

Compare imprinted and “shape-free” prototypes, which are generated randomly.

Imprinted Prototype

Imprinted Region

Shape-Free Prototype

Edges

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 79: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

79

Datasets: Synthetic tasks of Pinto et al. (2011)• Tests viewpoint invariant object recognition• Addresses shortcomings in existing benchmark datasets• Tunable difficulty• Difficult for current computer vision systems

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 80: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

80

Task: Face Discrimination

Face1 Face2

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 81: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

81

Task: Category Recognition

Cars Airplanes

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 82: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

82

Variation Levels

Difficulty

Increasing Change in

Location, Scale, and Rotation

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 83: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

83

Results: Face1 v. Face2

Error Bar: One standard error

Performance: Mean accuracy over five independent trials

Using 4075 prototypes.

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 84: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

84

Results: Face1 v. Face2

Error Bar: One standard error

Performance: Mean accuracy over five independent trials

Using 4075 prototypes.

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 85: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

85

Results: Cars v. Planes

Error Bar: One standard error

Performance: Mean accuracy over five independent trials

Using 4075 prototypes.

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 86: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

86

Summary

High performance on problems of invariant object recognition is possible with unlearned, “shape-free” features.

Why do random prototypes work?

Still an open question

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 87: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

87

Do more sophisticated learning methods do better than imprinting?

Many methods could be applied:• Feature Selection (Mutch & Lowe, 2008)• k-means (Louie, 2003)• Hebbian Learning (Brumby et al., 2009)• STDP (Masquelier & Thorpe, 2007)• PCA, ICA, Sparse Coding

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 88: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

88

Feature Selection

1. Imprint prototypes

2. Compute features

3. Evaluate features

4. Measure performance

PrototypesTraining Images

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 89: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

89

Feature Selection

1. Imprint prototypes

2. Compute features

3. Evaluate features

4. Measure performance

Input

S1

C1

S2

C2

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 90: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

90

Feature Selection

1. Imprint prototypes

2. Compute features

3. Evaluate features

4. Measure performance

Activation

Activ

ation

Category Boundary

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 91: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

91

Feature Selection

1. Imprint prototypes

2. Compute features

3. Evaluate features

4. Measure performance

Select: most discriminative prototypes.

Compute: Glimpse’s performance on test images using only most discriminative prototypes.

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 92: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

92

Results: Face1 v. Face2

Error Bar: One standard error

Performance: Mean accuracy over five independent trials, except feature selection.

10,000x Compute Cost

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 93: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

93

Results: Cars v. Planes

Error Bar: One standard error

Performance: Mean accuracy over five independent trials, except feature selection.

10,000x Compute Cost

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 94: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

94

Feature Selection

Advantage: Creates discriminative prototypes

Drawbacks:• Computationally expensive• Cannot synthesize new prototypes

Generally consistent with previous work.

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 95: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

95

k-means

1. Choose many prototypes from training images.

2. Identify k clusters of similar prototypes.

Iterative optimization process.

3. Create new prototypes by combining prototypes from each cluster.

Average of prototypes.

Now, use the k new prototypes in Glimpse to perform object recognition.

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 96: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

96

k-means

Advantages:• Fast and scalable• Used in similar networks (Coates & Ng, 2012)• Related to sparse coding

Drawback: Prototypes may not be discriminative.

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 97: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

97

Performance was no better than imprinting, and sometimes worse.

Reason is unclear.

Consistent with previous work?

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 98: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

98

Can k-means be improved?

“No-Child” “Child”

Methods: Investigate different weightings

Results: Found only marginal increase in performance over k-means

From Mick Thomure, Ph.D. Defense, PSU, 2013

Page 99: Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

99

Summary

Feature Selection: very helpful, but expensive

k-means: fast, no improvement over imprinting

Extended k-means: results similar to k-means

From Mick Thomure, Ph.D. Defense, PSU, 2013