hyeonsoo , kang

53
Hyeonsoo, Kang Unsupervised Mining of Statistical Temporal Structures in Video

Upload: iman

Post on 23-Feb-2016

49 views

Category:

Documents


0 download

DESCRIPTION

Unsupervised Mining of Statistical Temporal Structures in Video. Hyeonsoo , Kang. ▫ Introduction. ▫ Structure of the algorithm. Model learning algorithm [Review HMM] Feature selection algorithm . ▫ Results. What is “supervised learning?”. What is “supervised learning?”. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hyeonsoo , Kang

Hyeonsoo, Kang

Unsupervised Mining of Statis-tical Temporal Structures in Video

Page 2: Hyeonsoo , Kang

▫ Structure of the algorithm

▫ Introduction

1. Model learning algorithm2. [Review HMM]3. Feature selection algorithm

▫ Results

Page 3: Hyeonsoo , Kang

What is “supervised learn-ing?”

Page 4: Hyeonsoo , Kang

What is “supervised learn-ing?”

It is the way of doing such that the al-gorithm designers manually identify important structures, collect labeled data for training, and apply tools to earn the classifiers

Page 5: Hyeonsoo , Kang

Burden of labeling and training

Cannot be readily extended to diverse new domains at a large scale.

Good

Works for domain-specific problems at a small scale

Bad

Page 6: Hyeonsoo , Kang

Burden of labeling and training

Cannot be readily extended to diverse new domains at a large scale.

Good

Works for domain-specific problems at a small scale

Bad

Let’s aim at an automated method which works just fine for domain-spe-cific problems but also flexible & scalable!

Page 7: Hyeonsoo , Kang

Burden of labeling and training

Cannot be readily extended to diverse new domains at a large scale.

Good

Works for domain-specific problems at a small scale

Bad

Let’s aim at an automated method which works just fine for domain-spe-cific problems but also flexible & scalable!

But is that possible …?

Page 8: Hyeonsoo , Kang

A temporal sequence of nine shots, each shot is a sec-ond apart

Observations?

Page 9: Hyeonsoo , Kang

Similar color & move-ments

A temporal sequence of nine shots, each shot is a sec-ond apart

Page 10: Hyeonsoo , Kang

A temporal sequence of nine shots, each shot is a sec-ond apart

Observations?

Page 11: Hyeonsoo , Kang

Different color

A temporal sequence of nine shots, each shot is a sec-ond apart

Page 12: Hyeonsoo , Kang

A temporal sequence of nine shots, each shot is a sec-ond apart

Observations?

Page 13: Hyeonsoo , Kang

Different camera walk

A temporal sequence of nine shots, each shot is a sec-ond apart

Page 14: Hyeonsoo , Kang

Let’s focus on a particular domain of videos, such that (1) Video structures is in a discrete state-space(2) The features, i.e., observations from data are

stochastic (small statistical variations on the raw features)

(3) The sequence is highly correlated in time

Page 15: Hyeonsoo , Kang

Unsupervised learning ap-proaches are chiefly twofold:(a) Model learning algorithm

(b) Feature selection algorithm

Page 16: Hyeonsoo , Kang

(a) Model learning algorithm

(b) Feature selection algorithm

Using a fixed feature set manually selected based on heuristics build a model of good performance to dis-tinguish high-level structures of the given video

Using both the model learning algo-rithm and the feature selection algo-rithm results a model and a set of features that distinguish high-level structures of the given video well

Page 17: Hyeonsoo , Kang

(a) Model learning algorithm

(b) Feature selection algorithm

Using a fixed feature set manually selected based on heuristics build a model of good performance to dis-tinguish high-level structures of the given video

Using both the model learning algo-rithm and the feature selection algo-rithm results a model and a set of features that distinguish high-level structures of the given video well

Page 18: Hyeonsoo , Kang

(a) Model learning algorithm

1. Base line: uses a two level HHMM to model structures in video.

2. HHMM ::= Hierarchical Hidden Markov Model.

Hierarchical Hidden Markov Model is a statistical model derived from the Hidden Markov Model (HMM). The HHMM utilizes its structure to solve a subset of the problems more efficiently, but can be transformed into a standard HMM. Therefore, the coverage of HHMM and HMM are the same, but their performance.

Page 19: Hyeonsoo , Kang

(a) Model learning algorithm

1. Base line: uses a two level HHMM to model structures in video.

2. HHMM ::= Hierarchical Hidden Markov Model.

Hierarchical Hidden Markov Model is a statistical model derived from the Hidden Markov Model (HMM). The HHMM utilizes its structure to solve a subset of the problems more efficiently, but can be transformed into a standard HMM. Therefore, the coverage of HHMM and HMM are the same, but their performance.Wait, what is HMM then?

Page 20: Hyeonsoo , Kang

[Quick Review: HMM]

Consider a simple 3-state Markov model of the weather. We assume that once a day (e.g., at noon), the weather is observed as being one of the following:

(S1) State 1: rain (or snow)(S2) State 2: cloudy(S3) State 3: sunny

We postulate that the weather on day t is charac-terized by a single one of the three states above, and that the matrix A of state transition probabili-ties is

A = {aij} =

Given that the weather on day 1 (t = 1) is sunny (s-tate 3), we can ask the question: What is the prob-ability (according to the model) that the weather for the next 7 days will be “sunny-sunny-rain-sunny-cloudy-sunny- …?”

Page 21: Hyeonsoo , Kang

[Quick Review: HMM]

Stated more formally, we define the observation sequence O as

O = {S3, S3, S3, S1, S1, S3, S2, S3} “sunny-sunny-rain-sunny-cloudy-sunny- …?”

corresponding to t = 1, 2, …, 8, and we wish to de-termine the probability of O, given the model. This probability can be expressed (and evaluated) as

P(O|Model) = P[S3, S3, S3, S1, S1, S3, S2, S3 | Model]

Page 22: Hyeonsoo , Kang

[Quick Review: HMM]

Stated more formally, we define the observation sequence O as

O = {S3, S3, S3, S1, S1, S3, S2, S3} “sunny-sunny-rain-sunny-cloudy-sunny- …?”

corresponding to t = 1, 2, …, 8, and we wish to de-termine the probability of O, given the model. This probability can be expressed (and evaluated) as

P(O|Model) = P[S3, S3, S3, S1, S1, S3, S2, S3 | Model]= P[S3] P[S3|S3] P[S3|S3] P[S1|S3] P[S1|S1] P[S3|S1] P[S2|S3] P[S3|S2]

Page 23: Hyeonsoo , Kang

[Quick Review: HMM]

Stated more formally, we define the observation sequence O as

O = {S3, S3, S3, S1, S1, S3, S2, S3} “sunny-sunny-rain-sunny-cloudy-sunny- …?”

corresponding to t = 1, 2, …, 8, and we wish to de-termine the probability of O, given the model. This probability can be expressed (and evaluated) as

P(O|Model) = P[S3, S3, S3, S1, S1, S3, S2, S3 | Model]= P[S3] P[S3|S3] P[S3|S3] P[S1|S3] P[S1|S1] P[S3|S1] P[S2|S3] P[S3|S2]= a33 a33 a31 a11 a13 a32 a23A = {aij} =

Page 24: Hyeonsoo , Kang

[Quick Review: HMM]

Stated more formally, we define the observation sequence O as

O = {S3, S3, S3, S1, S1, S3, S2, S3} “sunny-sunny-rain-sunny-cloudy-sunny- …?”

corresponding to t = 1, 2, …, 8, and we wish to de-termine the probability of O, given the model. This probability can be expressed (and evaluated) as

P(O|Model) = P[S3, S3, S3, S1, S1, S3, S2, S3 | Model]= P[S3] P[S3|S3] P[S3|S3] P[S1|S3] P[S1|S1] P[S3|S1] P[S2|S3] P[S3|S2]= a33 a33 a31 a11 a13 a32 a23= 1 (0.8)(0.8)(0.1)(0.4)(0.3)(0.1)(0.2)

Page 25: Hyeonsoo , Kang

[Quick Review: HMM]

Stated more formally, we define the observation sequence O as

O = {S3, S3, S3, S1, S1, S3, S2, S3} “sunny-sunny-rain-sunny-cloudy-sunny- …?”

corresponding to t = 1, 2, …, 8, and we wish to de-termine the probability of O, given the model. This probability can be expressed (and evaluated) as

P(O|Model) = P[S3, S3, S3, S1, S1, S3, S2, S3 | Model]= P[S3] P[S3|S3] P[S3|S3] P[S1|S3] P[S1|S1] P[S3|S1] P[S2|S3] P[S3|S2]= a33 a33 a31 a11 a13 a32 a23= 1 (0.8)(0.8)(0.1)(0.4)(0.3)(0.1)(0.2)= 1.536 X 10-4

Where we use the notation = P[q1 = Si], 1 <= i <= N

to denote the initial state probabilities.

MMObservable

Page 26: Hyeonsoo , Kang

[Quick Review: HMM]Hidden Markov Model is not too different from the observable MM, just that each state now does not correspond to an observable (physical) event.

For example, assume the following scenario. You are in a room with a curtain through which you cannot see what is happening. On the other side of the curtain is another person who is performing a coin (or multiple coins) tossing experi-ment. The other person will not tell you anything about what he is doing exactly; he will only tell you the result of each coin flip.

An HMM is characterized by the following:1) N, the number of states in the model2) M, the number of distinct observation symbols

per state3) The state transition probability distribution A =

{aij}4) The observation symbol probability distribution

in state j, B = {bj(k)}, where Bj(k) = P[vk at t|qt = Sj],

1 <= j <= N, 1 <= k <= M.5) The initial state distribution = {} where

= P[q1 = Si], 1 <= i <= N.

Page 27: Hyeonsoo , Kang

[Quick Review: HMM]

HMM requires specification of two model parame-ters (N and M), specification of observation sym-bols, and the specification of the three probability measures, A, B, . Since N and M are implicit in other variables, we can use the compact notation

Page 28: Hyeonsoo , Kang

(a) Model learning algorithm

1. Base line: uses HHMM

2. HHMM ::= Hierarchical Hidden Markov Model.

Hierarchical Hidden Markov Model is a statistical model derived from the Hidden Markov Model (HMM). The HHMM utilizes its structure to solve a subset of the problems more efficiently, but can be transformed into a standard HMM. Therefore, the coverage of HHMM and HMM are the same, but their performance.Wait, what is HMM then? Now, to build a HHMM model, we need to estimate parameters, as we have seen in HMM model,

Page 29: Hyeonsoo , Kang

(a) Model learning algorithm

Wait, what is HMM then? Now, to build a HHMM model, we need to estimate parameters, as we have seen in HMM model,

We model the recurring event in each video as HMMs, and the higher-level transitions between these events as another level of Markov chain. This two-level HHMM: lower-level states repre-

sent variations that can occur within the same event (observations, i.e., measurements taken from the raw video, with mixture of Gaussian dis-tribution)

Higher level structure elements usually corre-spond to semantic events.

Page 30: Hyeonsoo , Kang

An example of HHMM

Page 31: Hyeonsoo , Kang

sunny rain cloudy

And lower nodes represent some

variations …

An example of HHMM

Page 32: Hyeonsoo , Kang

(a) Model learning algorithm

3. To estimate parameters we use

(1)Expectation Maximization (EM) algorithm(2)Bayesian Learning Techniques(3)Reverse-Jump Markov Chain Monte Carlo (RJ

MCMC)(4)Bayesian Information Criteria (BIC)

Page 33: Hyeonsoo , Kang

(a) Model learning algorithm

3. To estimate parameters we use

(1)Expectation Maximization (EM) algorithm(2)Bayesian Learning Techniques(3)Reverse-Jump Markov Chain Monte Carlo (RJ

MCMC)(4)Bayesian Information Criteria (BIC)

Model parameters are updated using EM Model structure learning uses MCMC; parameter

learning for HHMM using EM is known to con-verge to a local maximum of the data likelihood since EM is an hill-climbing algorithm. – But searching for a global maximum in the likelihood landscape is intractable. we adopt randomized search

Page 34: Hyeonsoo , Kang

(a) Model learning algorithm

3. To estimate parameters we use

(1)Expectation Maximization (EM) algorithm(2)Bayesian Learning Techniques(3)Reverse-Jump Markov Chain Monte Carlo (RJ

MCMC)(4)Bayesian Information Criteria (BIC)

Model parameters are updated using EM Model structure learning uses MCMC; parameter

learning for HHMM using EM is known to con-verge to a local maximum of the data likelihood since EM is an hill-climbing algorithm. – But searching for a global maximum in the likelihood landscape is intractable. we adopt randomized search

However, I will not go through them one by one… if you are interested, you can find it in the paper: Xie, Lexing, et al. [1].

Page 35: Hyeonsoo , Kang

(a) Model learning algorithm

(b) Feature selection algorithm

Using a fixed feature set manually selected based on heuristics build a model of good performance to dis-tinguish high-level structures of the given video

Using both the model learning algo-rithm and the feature selection algo-rithm results a model and a set of features that distinguish high-level structures of the given video well

Page 36: Hyeonsoo , Kang

(a) Model learning algorithm

(b) Feature selection algorithm

Using a fixed feature set manually selected based on heuristics build a model of good performance to dis-tinguish high-level structures of the given video

Using both the model learning algo-rithm and the feature selection algo-rithm results a model and a set of features that distinguish high-level structures of the given video well

Page 37: Hyeonsoo , Kang

Into what aspects does the feature selection can be divided and why?

Page 38: Hyeonsoo , Kang

Feature selection is divided into two aspects:

(1)Eliminating irrelevant features – usually irrelevant features disturb the classifier and degrade classification accuracy

(2)Eliminating redundant ones – re-dundant features add to computa-tional cost without bringing in new in-formation.

Into what aspects does the feature selection can be divided and why?

Page 39: Hyeonsoo , Kang

(b) Feature selection algorithm

1. We use filter-wrapper methods and wrapper step corre-sponds to eliminating irrelevant features, and filter step cor-responds to eliminating redundant ones.(a) Wrapper step – partitions the feature pool into consistent

groups(b) Filter step – eliminates redundant dimensions

2. For example there are features like …Dominant Color Ratio (DCR), Motion Intensity (MI), the least-square estimation of camera translation (MX, MY), and five audio features – Volume, Spectral roll-off (SR), Low-band energy (LE), High-band energy (HE), and Zero-crossing rate (ZCR)

Page 40: Hyeonsoo , Kang

(b) Feature selection algorithm

3. Algorithm structure The big picture would be:

HHMM

Viterbi state sequence information gain

Markov blanket filtering

BIC fitness

Page 41: Hyeonsoo , Kang

(b) Feature selection algorithm

3. Algorithm structure The big picture would be:

HHMM

Viterbi state sequence information gain

Markov blanket filtering

BIC fitness

In detail:

Page 42: Hyeonsoo , Kang

Experiments and Results

For soccer videos, the main evaluation focused on distinguishing the two semantic evens, play and break(a) Model learning algorithm

Page 43: Hyeonsoo , Kang

Experiments and Results

For soccer videos, the main evaluation focused on distinguishing the two semantic evens, play and break(a) Model learning algorithm

We use a fixed set of features manually selected on heuristics (dominant color ratio and motion inten-sity) (Xu et al., 2001; Xie et al., 2002b)

Page 44: Hyeonsoo , Kang

Experiments and Results

For soccer videos, the main evaluation focused on distinguishing the two semantic evens, play and break(a) Model learning algorithm

We use a fixed set of features manually selected on heuristics (dominant color ratio and motion inten-sity) (Xu et al., 2001; Xie et al., 2002b)

Built four different learning schemes against the ground truth:

(1)Supervised HMM (2)Supervised HHMM(3)Unsupervised HHMM without model adapta-

tion(4)Unsupervised HHMM with model adaptation

Page 45: Hyeonsoo , Kang

Experiments and Results

Page 46: Hyeonsoo , Kang

Experiments and Results

For soccer videos, the main evaluation focused on distinguishing the two semantic evens, play and break(b) Feature selection algorithmBased on the good performance of the model pa-rameter and structure learning algorithm, we test the performance of the automatic feature selection method that iteratively wraps around, and filters.

A 9-dimensional feature vector sampled at every 0.1 seconds including:Dominant Color Ratio (DCR), Motion Intensity (MI), the least-square estimation of camera translation (MX, MY), and five audio features – Volume, Spec-tral roll-off (SR), Low-band energy (LE), High-band energy (HE), and Zero-crossing rate (ZCR)

Page 47: Hyeonsoo , Kang

Experiments and Results

Evaluation against the play/break labels showed a 74.8 % accuracy.

For clip Spain, the final selected feature set was {DCR, Volume}; with 74.8% accuracyFor clip Korea, the final selected feature set is {DCR, MX}; with 74.5% accuracy

[Testing on the baseball video] Yielded three consistent compact feature groups:

{HE, LE, ZCR}, {DCR, MX}, {Volume, SR} Resulting segments have consistent perceptual

properties, with one cluster of segments mostly corresponding to pitching shots and other field shots when the game is in play, while the other cluster contains most of the cutaways shots, score boards and game breaks, respectively.

Page 48: Hyeonsoo , Kang

Summary

With a specific domain of videos (sports; soccer and baseball), our unsupervised learning method can perform well.

Our method was chiefly twofold, one was model learning algorithm and the other feature selection algorithm.

In model learning algorithm, We used HHMM as the basic model and used other techniques such as Ex-pectation Maximization (EM) algorithm, Bayesian Learning Techniques, Reverse-Jump Markov Chain Monte Carlo (RJ MCMC), and Bayesian Information Criteria (BIC) to set the parameters for the model.

In feature selection algorithm, together with a model of good performance, we used filter-wrapper methods to eliminate irrelevant and redundant fea-tures.

Page 49: Hyeonsoo , Kang

Questions1. What is supervised learning? 

2. What is the benefit of using unsupervised learn-ing?

3. Into what aspects does the feature selection can divided and why?

Page 50: Hyeonsoo , Kang

Questions1. What is supervised learning?  the algorithm designers manually identify impor-tant structures, collect labelled data for training, and apply supervised learning tools to learn the classifiers.2. What is the benefit of using unsupervised learn-ing?(A) It alleviates the  burden of labelling and train-ing.(B) also it provides a scalable solution for generaliz-ing video indexing techniques.3. Into what aspects does the feature selection can divided and why? Feature selection: is divided into two aspects (1) eliminating irrelevant features: Usually irrele-vant features disturb the classifier and degrade classification accuracy(2) eliminating redundant ones: Redundant fea-tures add to computational cost without bringing in new information.

Page 51: Hyeonsoo , Kang

Bibliography[1] Rabiner, Lawrence R. "A tutorial on hidden Markov models and selected applications in speech recognition." Proceedings of the IEEE 77.2 (1989): 257-286.[2] Xie, Lexing, et al. "Structure analysis of soc-cer video with hidden Markov models." Acoustics, Speech, and Signal Process-ing (ICASSP), 2002 IEEE International Confer-ence on. Vol. 4. IEEE, 2002.[3] Xie, Lexing, et al. "Unsupervised mining of statistical temporal structures in video." Video mining. Springer US, 2003. 279-307.[4] Xu, Peng, et al. "Algorithms and system for segmentation and structure analysis in soccer video." IEEE International Conference on Multi-media and Expo. 2001.

Page 52: Hyeonsoo , Kang

THANK YOU!

Page 53: Hyeonsoo , Kang

Q & A