iccv2009 recognition and learning object categories p1 c03 - 3d object models

3D object categorization

Courtesy of Prof. Silvio Savarese (U. Michigan, Ann-Arbor)

Kin

g A

rth

ur

an

d th

e k

nig

hts

of th

e R

ou

nd

Ta

ble

.

3D Object Categorization

• Chiu et al. ‘07

• Hoiem, et al., ’07

• Yan, et al. ’07

• Weber et al. ‘00

• Schneiderman et al. ’01

•Capel et al ’02

•Johnson & Herbert ‘99

•Thomas et al. ‘06

• Kushal, et al., ’07

• Savarese et al, 07, 08

•Bronstein et al, ‘03

•Ruiz-Correa et al. ’03,

•Funkhouser et al ’03

•Bart et al ‘04


Challenges

- how to model 3D shape variability?

- How to model texture (appearance) variability?

- How to link texture (appearance) across views?

Multi-view models


Mixture of 2D single view

models• Weber et al. ‘00


• Bart et al. ‘04

Full 3D models •Bronstein et al, ‘03





• Thomas et al. ‘06




• Yan, et al. ’07• Kushal, et al., ’07

• Liebelt et al 08

• Sun, Su, Savarese et al 09a,

09b

• Single 3D object recognition

• Single view object categorization

• 3D object categorization

Overview

•Zhang et al ’95

•Schmid & Mohr, ‘96

•Schiele & Crowley, ’96

•Lowe, ‘99

•Jacob & Barsi, ‘99

•Edelman et al. ’91

•Ullman & Barsi, ’91

• Rothwell ‘92

•Linderberg, ’94

•Murase & Nayar ‘94

•Rothganger et al., ‘04

•Ferrari et al, ’05

•Brown & Lowe ’05

•Snavely et al ’06

•Yin & Collins, ‘07

•Ballard, ‘81

•Grimson & L.-Perez, ‘87

•Lowe, ’87

Single 3D object recognition

Cou

rtesy o

f D. L

ow

e

Courte

sy o

f Roth

ganger e

t al

Difference of Gaussian (DOG): used in Lowe 99, Brown et al ‘05

Harris-Laplace: used in Rothganger et al. ‘06 Laplacian: used in Lazebnik et al. ‘04

Object representation:

Collection of patches in 3DRothganger et al. ’06

x,y,z +

h,v +

descriptor

Cou

rtesy o

f Roth

ga

ng

er

et a

l

Model learning Rothganger et al. ‘03 ’06

• N images of object from N

different view points

• Match key points between

consecutive views

[ create sample set]

•Use affine structure from motion to

compute 3D location and orientation +

camera locations [RANSAC]

• Upgrade model to Euclidean assuming

zero skew and square pixels

• Find connected components

• Use bundle adjustment to refine model

Build a 3D model:

Recognition[Rothganger et al. ‘03 ’06]

1. Find matches between model and test image

features


1. Find matches between model and test image

features

2. Generate hypothesis: •Compute transformation M from N matches (N=2; affine camera; affine key points)

3. Model verification• Use M to project other matched 3D model features into test image

• Compute residual = D(projections, measurements)


Goal:Estimate (fit) the best M in presence of outliers

Multi-view models

















• Sun et al 08

Full 3D

models

3D model

instance

3D model

instance

…

3D category

model




•Kazhdan et al.03

•Osada et al ‘02


•Johnson & Herbert ’99

•Amberg et al ‘08

A 3D model category is built from a collection of 3D range data or CAD models

Osada et al 02Shape distributions

Kazhdan et al. 03

Spherical harmonics

Full 3D

models

3D model

instance

3D model

instance

…

3D category

model

- Build a 3d model is expensive

- Difficult to incorporate appearance information

- Need to cope with 3D alignment (orientation, scale, etc…)




•Kazhdan et al.03

•Osada et al ‘02


•Johnson & Herbert ’99

•Amberg et al ‘08

A 3D model category is built from a collection of 3D range data or CAD models

Multi-view models

















• Sun et al 08

Multi-view models

…

Sparse set of interest points or parts of the objects are linked across views.

…

…

3D Category

model

Yan, et al. ’07

Multi-view models by rough 3d shapes

Hoiem, et al., ’07

Multi-view models by rough 3d shapes

‘Cou

rtesy o

f Th

om

as e

t al. 0

6

[Thomas et al. ’06]

Multi-view models by ISM representations

…

Sparse set of interest points or parts of the objects are linked across views.

…

…

Multi-view

model

[Thomas et al. ’06]

Multi-view models by ISM representations

A unified framework for 3D objectdetection, pose classification, pose synthesis

Savarese, Fei-Fei, ICCV 07

Savarese, Fei-Fei, ECCV 08

Sun, Su, Savarese, Fei-Fei, CVPR 09

Su, Sun, Fei-Fei, Savarese, ICCV 09

• Canonical parts captures diagnostic appearance information

• 2d ½ structure linking parts via weak geometry

Canonical parts

• If physical part is planar, canonical part is stable point on the manifold

• Canonical part can be computed from connected component of parts

connected

component of parts

Canonical parts

connected

component of parts

Linkage structure

Object Recognition

1. Find hypotheses of canonical parts consistent with a given pose

Query imagemodel

Algorithm

Object Recognition


Query image

2. Infer position and pose of other canonical parts

model

Algorithm

Object RecognitionQuery image

model



Algorithm

Object RecognitionQuery image

3. Optimize over E, G and s to find best combination of hypothesis

model



Algorithm

error






• Probabilistic generative part-based model

• Dense Multi-view representation on the viewing sphere






10:50am Thursday, Oct 1, Oral session 9

H. Su*, M. Sun*, L. Fei-Fei, S. Savarese, Learning a Dense

Multi-view Representation for Detection, Viewpoint

Classification and Synthesis of Object Categories.


Single

view

Mixture/

Multi-

view

Sav. et al,

07

Morphing

model (Su,

Sun, et al. 09)

View point invariant X √ √ √No supervision

√ X Xall views all instances

available

√

# Categories ~300 2 8 16

Share information

across viewsX √ √ √

View synthesis X X X √Pose estimation

√ X √ √

View point

iccv2009 recognition and learning object categories p1 c03 - 3d object models

Education