110922_ real-time human pose recognition in parts from single depth images.pptx

60
Real-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton Andrew Fitzgibbon Mat Cook Toby Sharp Mark Finocchi Richard Moore Alex Kipman Andrew Blake Microsoft Research Cambridge & Xbox Incubation CVPR 2011 Best Paper

Upload: meghana-d-bengalur

Post on 14-Apr-2015

31 views

Category:

Documents


0 download

DESCRIPTION

this is a real time pose recognition project which recognizes the human pose.

TRANSCRIPT

Page 1: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Real-Time Human Pose Recognition in Parts from Single Depth Images

Jamie Shotton Andrew Fitzgibbon Mat Cook

Toby Sharp Mark Finocchi Richard Moore

Alex Kipman Andrew Blake

Microsoft Research Cambridge & Xbox Incubation

CVPR 2011 Best Paper

Page 2: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx
Page 3: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

OUTLINE

• Introduction• Data• Body Part Inference and Joint Proposals• Experiments• Discussion

Page 4: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Introduction• Robust interactive human body tracking

– gaming, human-computer interaction, security,– telepresence, health-care

• Real time depth cameras– tracking from frame to frame but struggle to

re-initialize quickly and so are not robust– Our focus on per-frame initialization + tracking

algorithm• focus on pose recognition in parts

– 3D position candidates for each skeletal joint

Page 5: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Introduction• appropriate tracking algorithm

– Tracking people with twists and exponential maps (CVPR 1998)– Tracking loose limbed people (CVPR 2004) – Nonlinear body pose estimation from depth images (DAGM 2005)– Real-time hand-tracking with a color glove (ACM 2009)– Real time motion capture using a single time-of-flight camera (CVPR

2010)

Page 6: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Introduction• inspired by recent object recognition work that

divides objects into parts– Object class recognition by unsupervised scale-invariant learning

[CVPR 2003]– The layout consistent random field for recognizing and segmenting

partially occluded objects [CVPR 2006]

• Two key design goals– Computational efficiency– robustness

Page 7: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Introduction

Depth Image

dense probabilistic body part labeling

+spatially localized

near skeletal joints

3D proposalsegment generate

Page 8: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Introduction• We treat the segmentation into body parts

as a per-pixel classification task– Evaluating each pixel separately

• Training data– generate realistic synthetic depth images– train a deep randomized decision forest classifier avoid overfitting

Page 9: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Introduction• Overfitting

• Simple, discriminative depth comparison image features • maintaining high computational efficiency

Page 10: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Introduction• For further speed, the classifier can be run in

parallel on each pixel on a GPU• mean shift resulting in the 3D joint proposals

Page 11: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

What is Mean Shift ?

Non-parametricDensity Estimation

Non-parametricDensity GRADIENT Estimation

(Mean Shift)

Data

Discrete PDF Representation

PDF Analysis

PDF in feature space• Color space• Scale space• Actually any feature space you can conceive• …

A tool for:Finding modes in a set of data samples, manifesting an underlying probability density function (PDF) in RN

Page 12: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Intuitive Description

Distribution of identical billiard balls

Region ofinterest

Center ofmass

Mean Shiftvector

Objective : Find the densest region

Page 13: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Intuitive Description

Distribution of identical billiard balls

Region ofinterest

Center ofmass

Mean Shiftvector

Objective : Find the densest region

Page 14: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Intuitive Description

Distribution of identical billiard balls

Region ofinterest

Center ofmass

Mean Shiftvector

Objective : Find the densest region

Page 15: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Intuitive Description

Distribution of identical billiard balls

Region ofinterest

Center ofmass

Mean Shiftvector

Objective : Find the densest region

Page 16: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Intuitive Description

Distribution of identical billiard balls

Region ofinterest

Center ofmass

Mean Shiftvector

Objective : Find the densest region

Page 17: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Intuitive Description

Distribution of identical billiard balls

Region ofinterest

Center ofmass

Mean Shiftvector

Objective : Find the densest region

Page 18: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Intuitive Description

Distribution of identical billiard balls

Region ofinterest

Center ofmass

Objective : Find the densest region

Page 19: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

• Treat pose estimation as object recognition– using a novel intermediate body parts representation– spatially localize joints– low computational cost and high accuracy

Main contribution

Page 20: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

• (i) synthetic depth training data is an excellent proxy for real data

• (ii) scaling up the learning problem with varied synthetic data is important for high accuracy

• (iii) our parts-based approach generalizes better than even an oracular exact nearest neighbor

Experiments

Page 21: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Data

• Depth imaging and Motion capture data• Pose estimation research

– often focused on techniques– lack of training data

• Two problems on depth image– color– pose

Page 22: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

• Use real mocap data– Retargetted to a variety of base character models– to synthesize a large, varied dataset– 640x480 image at 30 frames per second

• Depth cameras > Traditional intensity sensors– working in low light levels– giving a calibrated scale estimate– resolving silhouette ambiguities in pose

Depth image

Page 23: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

• capture a large database of motion capture (mocap) of human actions– approximately 500k frames– (driving, dancing, kicking, running, navigating menus)

• Need not record mocap with variation in rotation– vertical axis, mirroring left-right, scene position body shape and size, camera pose– all of which can be addedin (semi-)automatically

Motion capture data

Page 24: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

• The classifier uses no temporal information– static poses– not motion

• frame to the next are so small as to be insignificant– using ‘furthest neighbor’ clustering algorithm– where the distance between poses

– j mean body joints , Pi mean i pose– Define distance more than 5 cm

Motion capture data

Page 25: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

• necessary to iterate the process of motion capture– sampling from our model– training the classifier– testing joint prediction accuracy

• CMU mocap database

Motion capture data

Page 26: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

• build a randomized rendering pipeline– sample fully labeled training images

• Goals– realism and variety

Generating synthetic data

Page 27: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Generating synthetic data

• First : randomly samples a set of parameters• Then uses standard computer graphics techniques

– render depth and body part images– from texture mapped 3D meshes

• Use autodesk motionbulider– slight random variation in height – and weight give extra coverage of body shapes– Others parameters

Page 28: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Generating synthetic data

Page 29: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Body Part Inference and Joint Proposals

• Body part labeling• Depth image features• Randomized decision forests• Joint position proposals

Page 30: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Body part labeling

• intermediate body part representation– as color-coded– Some directly localize particular skeletal joints– others fill the gaps

• transforms the problem into one that can readily be solved by efficient classification algorithms

Page 31: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Body part labeling

• The parts are specified in a texture map

Page 32: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Body part labeling

• 31 body parts:– LU/RU/LW/RW head, neck,– L/R shoulder, LU/RU/LW/RW arm, L/R elbow, L/R wrist, L/R– hand, LU/RU/LW/RW torso, LU/RU/LW/RW leg, L/R knee,– L/R ankle, L/R foot (Left, Right, Upper, loWer)

Page 33: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Depth image features

• di (x) is the depth at pixel x in image I• Ө= (u, v) describe offsets u and v• 1/di (x) ensures the features are depth invariant

Page 34: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Depth image features

• Individually these features provide only a weak signal• combination in a decision forest

– sufficient to accurately– disambiguate all trained parts

Page 35: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Depth image features

• The design of these features was strongly motivated by their computational efficiency– no preprocessing is needed– read at most 3 image pixels– at most 5 arithmetic operations– straightforwardly implemented on the GPU

Page 36: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Randomized decision forests

• Randomized decision forests– fast and effective multi-class classifiers– Implemented efficiently on the GPU– 1

Page 37: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Randomized decision forests

Page 38: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Randomized decision forests

Page 39: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Joint position proposals

• generate reliable proposals for the positions of 3D skeletal joints– the final output of our algorithm– used by a tracking algorithm to self initialize– and recover from failure

Page 40: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Joint position proposals

• A local mode-finding approach based on mean shift with a weighted Gaussian kernel– ^xi is the reprojection of image pixel xi– bc is a learned per-part bandwidth– world space given depth dI (xi)

Page 41: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Non-Parametric Density Estimation

Assumption : The data points are sampled from an underlying PDF

Assumed Underlying PDF Real Data Samples

Data point density implies PDF value !

Page 42: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Assumed Underlying PDF Real Data Samples

Non-Parametric Density Estimation

Page 43: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Assumed Underlying PDF Real Data Samples

?Non-Parametric Density Estimation

Page 44: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Parametric Density Estimation

Assumption : The data points are sampled from an underlying PDF

Assumed Underlying PDF

2

2

( )

2

i

PDF( ) = i

iic e

x-μ

x

Estimate

Real Data Samples

Page 45: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Joint position proposals

• Wic considers both the inferred body part probability at the pixel and the world surface area of the pixel

Page 46: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Joint position proposals

• The detected modes– lie on the surface of the body– pushed back into the scene by a learned z offset

produce a final joint position proposal• Bandwidth Bc = 0.065m• Threshold λc = 0.14• Z offset = 0.039m• Set = 5000 images by grid search

Page 47: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Joint position proposals

Page 48: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Experiments

• provide further results in the supplementary material– 3 trees, 20 deep, 300k training images per tree– 2000 training example pixels per image – 2000 candidate features Ө– 50 candidate thresholds ζ per feature

Page 49: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Experiments

• Test data– challenging synthetic and real depth images to

evaluate our approach– synthesize 5000 depth images

• Real test set– 8808 frames of real depth images– 15 different subjects– 7 upper body joint positions

Page 50: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Experiments

• Error metric:– quantify both classification

• average of the diagonal of the confusion matrix• between the ground truth part label and the most likely inferred part label

– Joint prediction accuracy• generate recall-precision curvesas a function of

confidence threshold• quantify accuracy as average precision per joint

Page 51: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Experiments

• Error metric:– This penalizes multiple spurious detections – Near the correct position which might slow a

downstream tracking algorithm• D = 0.1 m below closed real test data

Page 52: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Experiments

Page 53: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Experiments

Page 54: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Experiments

Page 55: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Experiments

Page 56: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Experiments

Page 57: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Experiments

• Real time motion capture using a single time-of-flight camera. [CVPR 2010]

Page 58: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Discussion

• accurate proposals – for the 3D locations of body joints– super real-time from single depth images

• body part recognition– as an intermediate representation

• a highly varied synthetic training set– train very deep decision forests– Depth invariant features without overfitting

Page 59: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Future work

• study of the variability in the source mocap data• Generative model underlying the synthesis pipeline• a similarly efficient approach

– directly regress joint positions– remove ambiguities in local pose

Page 60: 110922_  Real-Time Human Pose Recognition in Parts from Single Depth Images.pptx

Thank you