some recent works of human activity recognition 吴心筱 [email protected]

75
Some Recent Works of Human Activity Recognition 吴吴吴 [email protected]

Upload: pauline-riley

Post on 17-Dec-2015

305 views

Category:

Documents


2 download

TRANSCRIPT

Some Recent Works of Human Activity Recognition

吴心筱[email protected]

Action Description

Action, Object and Scene

Multi-View Action Recognition

Action Detection

Complex Activity Recognition

Multimedia Event Detection

Action Description

Extension of Interest Points

Extension of Bag-of-Words

Mid-level Attribute Feature

Dense Trajectory

Action Bank

Action Description

Bregonzio et al., CVPR, 2009

Clouds of interest points accumulated over multiple temporal scales

Extension of Interest Points

Matteo Bregonzio, Shaogang Gong and Tao Xiang. Recognising Action as Clouds of Space-Time Interest Points. CVPR 2009.

Holistic features of the clouds as the spatio-temporal information of interest points:

Extension of Interest Points

Matteo Bregonzio, Shaogang Gong and Tao Xiang. Recognizing Action as Clouds of Space-Time Interest Points. CVPR, 2009.

Wu et al., CVPR, 2011

Multi-scale spatio-temporal (ST) context distribution feature

Characterize the spatial and temporal context distributions of interest points over multiple space-time scales.

Extension of Interest Points

Xinxiao Wu, Dong Xu, Lixin Duan and Jiebo Luo. Action recognition using context and appearance distribution features. CVPR 2011.

A set of XYT relative coordinates between the center interest point and other interest points in a local region.

Multi-scale local regions across multiple space-time scales.

Extension of Interest Points

Xinxiao Wu, Dong Xu, Lixin Duan and Jiebo Liu. Action recognition using context and appearance distribution features. CVPR 2011.

Wu et al., CVPR, 2011

A global GMM is trained using all local features from all the training videos.

The video-specific GMM for a given video is generated from the global GMM via a Maximum A Posterior adaption process.

Extension of Bag-of-Words

Xinxiao Wu, Dong Xu, Lixin Duan and Jiebo Luo. Action recognition using context and appearance distribution features. CVPR 2011.

GMM vs Bag-of-Words

Kovashka and Grauman, CVPR, 2010

Exploit multiple “bag-of-words” model to represent the hierarchy of space-time configurations at different scales.

Extension of Bag-of-Words

A. Kovashka and K. Grauman. Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. CVPR, 2010.

Kovashka and Grauman, CVPR, 2010

A. Kovashka and K. Grauman. Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. CVPR, 2010.

Kovashka and Grauman, CVPR, 2010

A. Kovashka and K. Grauman. Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. CVPR, 2010.

Savarese, WMVC, 2008

Use a local histogram to capture co-occurences of words in a local region.

Extension of Bag-of-Words

S. Savarese, A. Delpozo, J.C. Niebles and L. Fei-Fei. Spatial-temporal correlatons for unsupervised action classification. WMVC, 2008.

M. Ryoo and J. Aggarwal, ICCV, 2009.

Propose a “featuretype X featuretype X relationship” histogram to capture both appearance and relationship information between pairwise visual words.

Extension of Bag-of-Words

M. Ryoo and J. Aggarwal. Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. ICCV, 2009.

Liu et al., CVPR, 2011.

Action attributes: a set of inter mediate concepts.

A unified framework: action attributes are effectively selected in a discriminative fashion.

Data-driven Attributes.

Mid-level Attribute Feature

Jingen Liu, Benjamin Kuipers and Silvio Savarese. Recognizing Human Actions by Attributes. CVPR, 2011.

Jingen Liu, Benjamin Kuipers and Silvio Savarese. Recognizing Human Actions by Attributes. CVPR, 2011.

Liu et al., CVPR, 2011. D

ata

Driv

en

Wang et al., CVPR, 2011.

Sample dense points from each frame and track them based on displacement information from a dense optical flow field.

Dense Trajectory

Heng Wang, Alexander Klaser, Cordelia Schmid and Cheng-Lin Liu. CVPR, 2011.

Wang et al., CVPR, 2011.

Four descriptors: Trajectory; HOG; HOF; MBH.

Heng Wang, Alexander Klaser, Cordelia Schmid and Cheng-Lin Liu. CVPR, 2011.

Sadanand and Corso, CVPR, 2011.

Object BankAction Bank

Action Bank: a large set of action detectors.

Action Bank

Sreemanananth Sadanand and Jason J. Corso. Action Bank: A High-Level Representation of Activity in Video, CVPR, 2012.

Sreemanananth Sadanand and Jason J. Corso. Action Bank: A High-Level Representation of Activity in Video, CVPR, 2012.

Actions, Object and Scene

Nazli Ikizler-Cinbis and Stan Sclaroff, ECCV, 2010

Combine the information from person, object and scene

Multiple instance learning + multiple kernel learning

A bag contains all the instances extracted from a video for a particular feature channel.

Different features have different kernel weights.

Nazli Ikizler-Cinbis and Stan Sclaroff, Object, Scene and Actions: Combining Multiple Features for Human Action Recognition, ECCV, 2010.

Nazli Ikizler-Cinbis and Stan Sclaroff, Object, Scene and Actions: Combining Multiple Features for Human Action Recognition, ECCV, 2010.

Marcin Marszalek, Ivan Laptev and Cordelia Schmid, CVPR 2009.

Automatically discover the relation between scene classes and human actions: using movie scripts

Marcin Marszalek, Ivan Laptev and Cordelia Schmid, Actions in Context, CVPR, 2009.

Develop a joint framework for action and scene recognition in natural video

Multi-View Action Recognition

Multiple ViewsView-invariant Recognition

View-cross Recognition

Weinland et al., ICCV, 2009.

A 3D visual hull is proposed to represent an action exemplar using a system of 5 calibrated cameras.

Daniel Weinland, Edmond Boyer and Remi Ronfard. Action recognition from arbitrary views using 3D exemplars. ICCV, 2009.

View-invariant

Weinland et al., ICCV, 2009.

3D exemplar-based HMM for classification

Daniel Weinland, Edmond Boyer and Remi Ronfard. Action recognition from arbitrary views using 3D exemplars. ICCV, 2009.

View-invariantYan et al., CVPR, 2008.

4D action feature: 3D shapes over time (4D)

Pingkun Yan, Saad M. Khan, Mubarak Shah. Learning 4D Action Feature Models for Arbitrary View Action Recognition. CVPR, 2008.

View-invariantJunejo et al., IEEE TPAMI, 2008.

A novel view-invariant feature: self-similarity descriptor

Frame-to-frame similarity

Imran N. Junejo, Emilie Dexter, Ivan Laptev and Patrick Perez. View-independent action recognition from temporal self-similarities. IEEE T-PAMI, 2008.

View-invariantLewandowski et al, ECCV, 2010.

View-independent manifold representation

A stylistic invariant embedded manifold is produced to describe an action for each view.

All view-dependent manifolds are automatically combined to generate an unified manifold .Michal Lewandowski, Dimitrios Makris, and Jean-Christophe

Nebel. View and style-independent action manifolds for human activity recognition, ECCV, 2010.

View-invariantWu and Jia, ECCV, 2012.

Propose a latent kernelized structural SVM.

The view index is treated as a latent variable and inferred during both training and testing.

Xinxiao Wu and Yunde Jia. View-Invariant action recognition using latent kernelized structural SVM. ECCV, 2012.

kernelized

Cross-viewLiu et al., CVPR, 2011.

Learn the bilingual-words from both source view and target view.

Transfer action models between two views via the bag-of-bilingual-words model.

Jingen Liu, Mubarak Shah, Benjamin Kuipers and Silvio Savarese. Cross-View Action Recognition via View Knowledge Transfer. CVPR 2011.

Cross-viewLi et al, CVPR, 2012.

Propose “virtual views” to connect action descriptors from source view and target view.

Each virtual view is associated with a linear transformation of the action descriptor,and the sequence of transformations arising from the sequence of virtual views aims at bridging the source and target views Xinxiao Wu and Yunde Jia. View-Invariant action recognition

using latent kernelized structural SVM.

Cross-viewWu et al., PCM, 2012.

Transfer Discriminant-Analysis of Canonical Correlations (Transfer DCC).

Minimize the mismatch between data distributions of source and target views.

Xinxiao Wu, Cuiwei Liu, and Yunde Jia. Transfer discriminant-analysis of canonical correlations for view-transfer action recognition, PCM, 2012.

Action Detection

Yuan et al., IEEE T-PAMI, 2010.

A discriminative pattern matching criterion for action classification: naïve-Bayes mutual information maximization (NBMIM)

An efficient search algorithm: spatio-temporal branch-and-bound (STBB) search algorithm

Junsong Yuan, Zicheng Liu, and Ying Wu, Discriminative video pattern search for efficient action detection, IEEE T-PAMI, 2012.

Hu et al., ICCV, 2009.

The candidate of regions of an action are treated as a bag of instances.

A novel multiple-instance learning framework, named SMILE-SVM (Simulated annealing Multiple Instance Learning Support Vector Machines), is proposed for learning human action detector.

Yuxiao Hu, Liangliang Cao, Fengjun Lv, Shuicheng Yan, Yihong Gong and Thomas, S. Huang. Action detection in complex scenes with spatial and temporal ambiguities. ICCV, 2009.

Complex Activity Recognition

Gaidon et al., CVPR, 2011.

Actom Sequence Model: represent an activity as a sequence of atomic action-anchored visual features.

Automatically detect atomic actions from an input activity video.

A. Gaidon, Z. Harchaoui, and C. Schmid. Actom sequence models for efficient action detection. CVPR, 2011.

Hoai et al., CVPR, 2011.

Jointly perform video segmentation and action recognition.

M. Hoai, Z. Lan, and F. Torre. Joint segmentation and classification of human actions in video. CVPR, 2011.

Tang et al., CVPR, 2012.

Each activity is modeled by a set of latent state variables and duration variables.

The states are the cluster centers by clustering all the fixed-length video clips from training data.

A max-margin based discriminative model is introduced to learning the temporal structure of complex events.

K. Tang, F.-F. Li, and D. Koller. Learning latent temporal structure for complex event detection. CVPR, 2012.

Multimedia Event Detection

Izadinia and Shah, ECCV, 2012.

A latent discriminative model is proposed to detect the low-level events by modeling the co-ocurrence relationship between different low-level events in a graph.

Each video is divided into short clips and each clip is manually annotated using one low-level event label, which are used fro training the low-level detectors.

H. Izadinia and M. Shah. Recognizing complex events using large margin joint low-level event model. ECCV, 2012.

Thanks for your attention!

Q & A?