video summarization by graph optimization lu shi oct. 7, 2003
Post on 21-Dec-2015
212 views
TRANSCRIPT
Video summarization by graph optimization
Lu Shi
Oct. 7, 2003
Outline Introduction Goals Stage I: Candidate video shot selection
Video segmentation Video feature detection Candidate video shots
Stage II: Graph based video summary generation Dissimilarity function Spatial-temporal relation graph Optimization
Experiments and Results Conclusion & Future Work
IntroductionMotivation
Huge volume of video data are distributed over the Web
How to help the user to grasp the content of the video quickly
When the bandwidth is narrow, how to present the video to the user
Applications Video skimming (dynamic) Static story board (static)
Goals Criterion for video summary
Conciseness. The video skimming should not exceed the given
target length
Comprehensive coverage Both the visual diversity and temporal distribution of
the original video should be covered.
Visual coherence. The video skimming should not be too jumpy
Stage I: Candidate shot selection
Video segmentation A video shot is an unbroken sequence of images
recorded continuously by a camera. The content of a video shot can be represented by
key frames(e.g first and last) A video sequence is formed by a series of video
shots Video shots can be detected by various video
segmentation methods.
Stage I: Candidate shot selection
Video segmentation Middle slice image (Concatenated by video frame center lines) Calculate minimal pixel difference between rows Filtering and thresholding
Stage I: Candidate shot selection
Video feature detection Face detection Voice, noise detection Audio volume Specific color (fire,etc) Text caption
Features indicate interesting content that should be considered putting into the summary
Stage I: Candidate shot selection
Select candidate shots With interesting features extracted Any combination of extracted features Adjacent candidate shots can be merged into video shot
clusters to increase the visual coherence
Stage II: Graph modeling
Video shot pairwise dissimilarity function Visual(spatial) similarity: Histogram
correlation between key frames Temporal distance: the distance between
shot center points Definition
)),((),(1),( ji shshsTemporalDikjiji eshshVisualSimshshDis
Stage II: Graph modeling
Video shot pairwise dissimilarity function Linear with visual dissimilarity Exponential with temporal distance: to
approximate the user’s memory (k = 400 in the experiment)
Definition Similar definition for video clusters
)),((),(1),( ji shshsTemporalDikjiji eshshVisualSimshshDis
Stage II: Graph modeling Video shot cluster pairwise dissimilarity function
Between one video shot and one video shot cluster
Between two shot clusters
jxj
x
xxiji scsh
sclength
shlengthshshDisscshDis ,
)(
)(),(),(
iyi
y
yjyji scsh
sclength
shlengthscshDisscscDis ,
)(
)(),(),(
Stage II: Graph modeling
Model the candidate shot set as a directional graph G(V,E), conveys both the spatial and the temporal property of
the video A vertex vi corresponds to a video shot, the weight on the
vertex is the shot’s length An edge eij corresponds to the dissimilarity between video
shot i and shot j
Stage II: Graph modeling
The real shot/cluster pairwise dissimilarity function
Stage II: Graph based video summary generation
Video skimming generation Given a target video skimming length SummaryLength A path in the spatial-temporal relation graph corresponds to
a set of video shots The object function is the length of the path Find the longest path, with the constraint that the vertex
weight summation of the path is within [Summarylength-threshold, SummaryLength]
Stage II: Graph based video summary generation
Optimal substructure We denote the state as (ThisShot, LeftSize) The optimal substructure is:
If LeftSize is too small then opt(ThisShot, LeftSize) = 0 And then we can use dynamic programming to find the best
solution.
)(,((max),( 1 NextShotlengthLeftSizeNextShotoptLeftSizeThisShotopt ShotNumThisShotNextShot
)),( NextShotThisShot shshDis
Stage II: Graph based video summary generation
Dynamic programming Set opt(LastShot, 0..threshold) to 0; Set opt(LastShot, threshold+1…SummaryLength) to -X Calculate the opt(ThisShot, LeftSize) with the optimal
substructure equation, ThisShot from LastShot-1 to 0,
Get opt(0,SummaryLength), which is the longest path’s
length. Then trace back to find the path. The time complexity: The spatial complexity:
gthSummaryLenn 2
gthSummaryLenn
Stage II: Graph based video summary generation
Video skimming generation The generated video skimming based on video shots and
video shot clusters is shown below ( SummaryLength= 1500, Video Length = 11479).
Stage II: Graph based video summary generation
Static video story board generation The static video story board is generated with the key
frames of the skimming video shots.
Stage II: Graph based video summary generation
Evaluation The generated video skimming has grasped both
the visual diversity and temporal coverage Massive subjective test not carried out yet (Does it
make sense?) Quantitative objective evaluation is a big problem
Future work
Combine with video structure V-Toc (Video table of
contents) Video shot groups Video scenes
Future work Video structure
Video shot group and video scene
Q & A
Thank you!