김덕주 (duck ju kim). problems what is the objective of content-based video analysis? why...

36
Movie Content Analysis, Indexing and Skimming 김김김 (Duck Ju Kim)

Upload: cecil-green

Post on 22-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • (Duck Ju Kim)
  • Slide 2
  • Problems What is the objective of content-based video analysis? Why supervised identification has limitation? Why should use integrated media data?
  • Slide 3
  • Introduction Analysis Structured organization Embedded semantics Indexing Tagging semantic units Limited machine perception Skimming Abstraction & Presentation Video browsing
  • Slide 4
  • Event Detection Approach Shot detection Low-level structure Not correspond directly to video semantics Scene extraction Higher-level context Many unimportant contents Event extraction Higher semantic level Better reveal, represent, abstraction
  • Slide 5
  • Speaker Identification Approach Standard speech databases YOHO, HUB4, SWITCHBOARD Integration from media cues Speaker recognition + Facial analysis Speech cues + Visual cues Supervised Identification Fixed speaker models Insufficient training data Data collection before processing
  • Slide 6
  • Video Skimming Approach Pre-developed schemes Discontinuous semantic flow Ignored embedded audio cue Computation of six types of features Importance evaluation Assembling important events
  • Slide 7
  • Content Pre-analysis Shot detection Color histogram-based approach Extract keyframes The first and last frames Audio content Classification Silence, speech, music, environmental sounds Visual content Detect human faces
  • Slide 8
  • Movie Event Extraction Develop thematic topics Through actions or dialogs What to extract? Two-speaker dialogs Multiple-speaker dialogs Hybrid Events
  • Slide 9
  • Movie Event Extraction How to extract? Shot sink computation Grouping close and similar shots Sink clustering and characterization Periodic, partly-periodic, non-periodic Event extraction and classification Post-processing
  • Slide 10
  • Shot Sink Computation Pool of close and similar shots Using Visual Information Window-based Sweep Algorithm
  • Slide 11
  • Shot Sink Clustering Clustering & Characterizing Periodic, Partly-periodic, Non-periodic Degree of shot repetition Determining the sink periodicity Calculate relative temporal distance Compute mean , standard deviation Grouping with K-means algorithm
  • Slide 12
  • Slide 13
  • Integrating Speech & Face Information False Alarm Montage presentation -> Spoken Dialog Multiple-speaker dialog -> Two-speaker dialog Solution to reducing Embedded audio information integration Speech shot ratio calculation Facial cue inclusion Face detection
  • Slide 14
  • Adaptive Speaker Identification Shot detection & Audio classification Face detection & Mouth tracking Speech segmentation / clustering Initial speaker modeling Audiovisual-based speaker identification Unsupervised speaker model adaptation
  • Slide 15
  • Slide 16
  • Face Detection & Mouth Tracking Detection & Recognition of talking faces Distance between eyes and mouth : dist Eyes position : (x1, y1), (x2, y2) Mouth center : (x, y)
  • Slide 17
  • Speech Segmentation
  • Slide 18
  • Speech Clustering Two separate segments X1, X2 Joined segment X = {X1, X2} For cluster C have n homogeneous speech segments Dist(X, C) =, Negative value -> Considered from the same speaker
  • Slide 19
  • Initial Speaker Modeling Required for identification process Exploiting the inter-relations between facial and speech cues For each target cast member A Find a speech shot where A is talking Collect all the speech segments Build initial model Gaussian Mixture Model(GMM)
  • Slide 20
  • Likelihood-based speaker identification GMM model notation, j = 1, 2, , m For ith enrolled speaker The log likelihood between X and Mi
  • Slide 21
  • Audiovisual integration for speaker identification Finalizing the speaker identification task Integration of audio and video cues Examine the existence of temporal overlap Overlap ratio > Threshold Assign face vector to cluster Otherwise, set face vector to null Speaker Identity
  • Slide 22
  • Unsupervised Speaker Model Adaptation Updating the speaker model Three approaches Average-based model adaptation MAP-based model adaptation Viterbi-based model adaptation
  • Slide 23
  • Average-based Model Adaptation Compute BIC distances Compare between d min and threshold T d min < T : d min > T : Initialize new mixture component Update the weight for each component
  • Slide 24
  • MAP-based Model Adaptation i : Mean of b i d L i : Occupation likelihood of the adaptation data -bar : Mean of the observed adaptation data
  • Slide 25
  • Viterbi-based Model Adaptation Allows different feature vectors from different components Hard decision Any vector can either occupy component or not Indicator function instead of probability function Mixture component
  • Slide 26
  • Event-based Movie Skimming Event feature extraction Six types of mid- to high-level features Evaluation of importance Movie skim generation Assemble major events -> final skim
  • Slide 27
  • Event Feature Extraction Music Ratio Speech Ratio Sound Loudness Action Level Normalized by dividing the largest value Present Cast Theme Topic
  • Slide 28
  • Event Feature Extraction M : # of features extracted N : # of events a i,j : value of jth feature in ith event
  • Slide 29
  • Movie Skim Generation Choosing important events Users feature preference Event importance vector
  • Slide 30
  • Event Detection Results Correctness of the event classification System performance evaluation Hybrid class excluded
  • Slide 31
  • Slide 32
  • Speaker Identification Results Evaluation of adaptive speaker identification system False acceptance(FA) False rejection(FR) Identification accuracy(IA)
  • Slide 33
  • Slide 34
  • Average-based, MAP-based, Viterbi-based
  • Slide 35
  • Slide 36
  • Movie Skimming Results Difficulties of Qualitative evaluation Quantitative measure based on user study 5-point scale : 1~5 Visual comprehension Audio comprehension Semantic continuity Good abstraction Quick browsing Video skipping