iccv2011 learning spatiotemporal graphs of human activities

30
Learning Spa+otemporal Graphs of Human Ac+vi+es Sinisa Todorovic William Brendel

Upload: zukun

Post on 19-May-2015

471 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Iccv2011 learning spatiotemporal graphs of human activities

Learning  Spa+otemporal  Graphs  of  Human  Ac+vi+es  

Sinisa  Todorovic  William  Brendel  

Page 2: Iccv2011 learning spatiotemporal graphs of human activities

Our Goal Long Jump Triple Jump

• Recognize all occurrences of activities •  Identify the start and end frames •  Parse the video and find all subactivities •  Localize actors and objects involved

Page 3: Iccv2011 learning spatiotemporal graphs of human activities

Weakly Supervised Setting

In  training:          >  ONLY  class  labels      

Domain  knowledge  of  temporal  structure:          >  NOT  AVAILABLE  

Weight Lifting Large-Box Lifting

Page 4: Iccv2011 learning spatiotemporal graphs of human activities

Learning What and How

Weak  supervision  in  training      

Need  to  learn  from  training  videos:    

What  ac+vity  parts  are  relevant    

How  relevant  they  are  for  recogni+on  

Page 5: Iccv2011 learning spatiotemporal graphs of human activities

Prior Work vs. Our Approach Typically, focus only on HOW

gap

semantic level

raw video

features

model

Page 6: Iccv2011 learning spatiotemporal graphs of human activities

Prior Work vs. Our Approach

semantic level

raw video

Typically...

mid-level features

semantic level

raw video

features

model model gap

Page 7: Iccv2011 learning spatiotemporal graphs of human activities

Prior Work – Video Representation

•  Space-­‐+me  points  –  Laptev  &  Schmid  08,  Niebles  &  Fei-­‐Fei  08,  …  

•  S+ll  human  postures  –  SoaLo  07,  Ning  &  Huang  08,  …  

•  Ac+on  templates  –  Yao  &  Zhu  09,  …  

•  Point  tracks  –  Sukthankar  &  Hebert  10,  …  

 

Page 8: Iccv2011 learning spatiotemporal graphs of human activities

Our Features: 2D+t Tubes

Sukthankar & Hebert 07,

Gorelick & Irani 08,

Pritch & Peleg 08, ...

•  Allow  simpler:  -­‐  Modeling  -­‐  Learning  (few  examples)    -­‐  Inference  

   

•  We  are  the  first  to  use  2D+t  tubes  for  building  a  sta+s+cal  model  of  ac+vi+es  

Page 9: Iccv2011 learning spatiotemporal graphs of human activities

Our Features: 2D+t Tubes

•  Allow  simpler:  -­‐  Modeling  -­‐  Learning  (few  examples)    -­‐  Inference  

   

•  We  use  2D+t  tubes  for  building  a  sta+s+cal  genera+ve  model  of  ac+vi+es  

Sukthankar & Hebert 07,

Gorelick & Irani 08,

Pritch & Peleg 08, ...

Page 10: Iccv2011 learning spatiotemporal graphs of human activities

Prior Work – Activity Representation

•  Graphical  models,  Grammars  -­‐  Ivanov  &  Bobick  00  -­‐  Xiang  &  Gong  06  -­‐  Ryoo  &  Aggawal  09  -­‐  Gupta  &  Davis  09  -­‐  Liu  &  Zhu  09  -­‐  Niebles  &  Fei-­‐Fei  10  -­‐  Lan  et  al.  11    

 

•  Probabilis+c  first-­‐order  logic  -­‐  Tran  &  Davis  08  -­‐  Albanese  et  al.  10  -­‐  Morariu  &  Davis  11  -­‐  Brendel  et  al.  11...  

Page 11: Iccv2011 learning spatiotemporal graphs of human activities

Approach

       Input                                    Spa+otemporal                                              Ac+vity                Recogni+on          Video                                                    Graph                                                                Model                  Localiza+on  

Page 12: Iccv2011 learning spatiotemporal graphs of human activities

Blocky Video Segmentation

Page 13: Iccv2011 learning spatiotemporal graphs of human activities

Activity as a Spatiotemporal Graph

Descriptors of nodes and edges:

•  Node descriptors: F - Motion

- Object shape

•  Adjacency Matrices: {Ai} - Allen temporal relations

- Spatial relations

- Compositional relations

Page 14: Iccv2011 learning spatiotemporal graphs of human activities

Activity as Segmentation Graph

G = (V, E, "descriptors") = (F, {A1, ..., An})

node descriptors

adjacency matrices of distinct relations between the tubes

Page 15: Iccv2011 learning spatiotemporal graphs of human activities

Activity Graph Model

compositional

+ spatial temporal

+ * * *

model node descriptors mixture weights

Probabilistic Graph Mixture

model adjacency matrices

Page 16: Iccv2011 learning spatiotemporal graphs of human activities

Activity Model

An  ac+vity  instance: G = (F, {A1,..., An})

Model adjacency matrices

Edge type: i =1, 2,..., n

Page 17: Iccv2011 learning spatiotemporal graphs of human activities

Activity Model

An  ac+vity  instance: G = (F, {A1,..., An})

Model adjacency matrices

Edge type: i =1, 2,..., n

Model matrix of node descriptors

Page 18: Iccv2011 learning spatiotemporal graphs of human activities

Inference

       Input                                    Spa+otemporal                                              Ac+vity                Recogni+on          Video                                                    Graph                                                                Model                  Localiza+on  

Page 19: Iccv2011 learning spatiotemporal graphs of human activities

Inference = Robust Least Squares

Goal:    

• For  every  ac+vity  model  

• Es+mate  the  permuta+on  matrix  

subject to

Page 20: Iccv2011 learning spatiotemporal graphs of human activities

Learning the Activity Graph Model

Training  videos  →  Training  graphs  →  Graph  model      

Page 21: Iccv2011 learning spatiotemporal graphs of human activities

Learning

             Adjacency  matrix                            Node  descriptor  

Edge type: i =1, 2,..., n

Given K training graphs,

Page 22: Iccv2011 learning spatiotemporal graphs of human activities

Learning

Model parameters

Given K training graphs, ESTIMATE

             Adjacency  matrix                                Node  descriptor  

Page 23: Iccv2011 learning spatiotemporal graphs of human activities

Learning

Given K training graphs, ESTIMATE

             Adjacency  matrix                                Node  descriptor  

Permutation matrix

Page 24: Iccv2011 learning spatiotemporal graphs of human activities

Learning = Robust Least Squares

Es4mate:   and  

Given  K  Training  graphs:  

Page 25: Iccv2011 learning spatiotemporal graphs of human activities

Learning = Structural EM

Estimatation of model parametrs

Estimation of permutation matrices

E-step à expected model structure

M-step à matching of the training graphs and model

Page 26: Iccv2011 learning spatiotemporal graphs of human activities

Learning Results

Correctly  learned  ac+vity-­‐characteris+c  tubes    

Page 27: Iccv2011 learning spatiotemporal graphs of human activities

Recognition and Segmentation

Ac+vity  “handshaking”  Detected  and  segmented  characteris+c  tube  

 

Page 28: Iccv2011 learning spatiotemporal graphs of human activities

Recognition and Segmentation

Ac+vity  “kicking”  Detected  and  segmented  characteris+c  tube  

 

Page 29: Iccv2011 learning spatiotemporal graphs of human activities

Classification on UTexas Dataset

Human  interac+on  ac+vi+es  [18]  Ryoo  et  al.  ’10  

Page 30: Iccv2011 learning spatiotemporal graphs of human activities

Conclusion

•  Fast spatiotemporal segmentation

•  New activity representation = graph model

•  Unified learning and inference = Least squares

•  Learning under weak supervision:

- WHAT activity parts are relevant and

- HOW relevant they are for recognition