robust region of interest determination based on user attention model through visual rhythm analysis

Robust Region-of-Interest Determination Based on User Attention Model Through Visual Rhythm Analysis

Robust Region-of-Interest Determination Based on User Attention Model Through Visual Rhythm Analysis: M9915026 M9915048 M9915044 M9915902 M9915016 M9915081 M9907513

OutlineIntroductionVisual Rhythm And User Attention ModelROI Determination Through User Attention ModelFMO-aware ROI Determination For H.264/AVC Video codingExperimental ResultsConclusion


IntroductionROI determination is required for video data transmission.Moving objects will catch users focus points as ROIs in consecutive frames, but they are computational intensive.Visual rhythm can describe the characteristic of video content.ROI determination based on attention models through visual rhythm analysis.


Visual Rhythm Visual rhythm can efficiently capture the temporal information of a video.

Visual Rhythm

mndiagonalAnti-diagonal

m : width of a video framen : height of a video frame

rd : the ratios of pixel sampling for diagonalra : the ratios of pixel sampling for diagonalSampling lines:Diagonal (D), Anti-diagonal (A), Vertical (V), Horizontal (H).

Visual Rhythm

Di represents the gray scale value of the diagonal sampling pixels in the ith frame.Ai represents the gray scale value of the anti-diagonal sampling pixels in the ith frame.

User Attention Models

Visual rhythm images can be categorized into six attention model.

User Attention Models (Horizontal)

User Attention Models (Vertical)

User Attention Models (Expanding)

User Attention Models (Absorbing)

User Attention Models (Diagonal)

User Attention Models (Anti-diagonal)

User Attention Models (POSSIBLE EVENTS)Horizontalattention modelVerticalattention modelExpandingattention modelAbsorbingattention modelDiagonalattention modelAnti-diagonalattention modelDiagonalsamplingAnti-diagonalsamplingHorizontalsamplingVerticalsampling

NOTE : no one sampling line can represent all events through the six attention16

DEMO


ROI DeterminationFour sampling lines can obtain the efcient attention model to characterize the event of a video and avoid false alarm. The center-crossed diagonal and anti-diagonal sampling lines are first utilized to analyze the attention model of the current frame, and then the vertical and horizontal sampling lines are integrated to derive the final user attention model in order to obtain the ROI.

ROI Determination

Visual Rhythm Creation Difference calculationVisual rhythm historyBinary thresholdingMorphological merging

ROI Determination

Fig. 4. Visual rhythms of diagonal and anti-diagonal sampling lines acquired from Salesman QCIF sequence with 176 frames. (a) Diagonal and (b) anti-diagonal.

Fig. 5. Visual rhythm difference images acquired from Fig. 4. (a) Diagonal and (b) anti-diagonal.

Obviously, the variation of the visual rhythms embeds signicant information about object movement shown below:Difference calculation

Fig. 6. Visual rhythm historical images acquired from Fig. 5. (a) Diagonal and (b) anti-diagonal.

according to the variation of the visual rhythm:

Visual rhythm history

The threshold is calculated by averaging the historical values, which stand for the variation of the visual rhythm.

Fig. 7. Binarized images derived from Fig. 6 by the thresholding process of the historical statistics. (a) Diagonal and (b) anti-diagonal.

Binary thresholdingrepresents the binary image according to their magnitudes of variations.

Illustrations of the proposed merging steps.Morphological merging

Images of the scopes of user attention in the diagonal and anti- diagonal visual rhythms. (a) Diagonal and (b) anti-diagonal. Morphological merging

Center of Visual Rhythm

Vertical and Horizontal Visual hythms28

Vertical Horizontal


FMO-AWARE ROI DETERMINATION FOR H.264/AVC VIDEO CODING

Flexible macroblock ordering (FMO) was introduced in H.264/AVC through a new error resilience tool and can be used for ROI video coding as well.

In H.264/AVC reference software JM 13.2, the FMO functionality supports eight slice ordering numbers, from 0 to 7, with 0 as its first priority. Thus, the ROI determination, which is followed by the FMO technique in H.264/AVC , classifies the MBs into three slices from 0 to 2.

Skin Color Extraction and Visual Rhythm ROI DeterminationSince human faces are usually the loci of attention in conversations, human faces should be regarded as the ROI regions in the implementation.Here, both skin color extraction and visual rhythm ROI determination schemes can detect ROI areas.Fig. 16 shows the results of each step in the proposed FMO-aware ROI determination.

16(b) and (d), the skin color pixels are extracted and then categorized into a macroblockbased image, respectively.Then Fig. 16(e) sketches the contour of the user attention region from the result of Fig. 16(c).Fig. 16(d) and (e) illustrate the individual ROI results in terms of white and black macroblocks, where white macroblocks represent the ROI region.

FMO-AWARE ROI DETERMINATION

Extended ROI MacroblocksIn implementations, ROI regions do not always stay in the same position in a consecutive sequence, and a macroblock may change its ROI status between two consecutive frames.Therefore, the variation of generated bits will be raised when a macroblock changes its situation from a non-ROI region in the previous frame to an ROI region in the current frameMoreover, the visual quality suffers from obvious artifacts in the boundary between ROI macroblocks and non-ROI ones.However, it is observed in [24], [25] that an extended region around the ROI regions is beneficial to reduce the artifact while ensuring regions with targets are not missedTherefore, the extended ROI macroblocks have the ROI regions obtained above as its center in our implementation. Fig. 16(f) and (g) illustrates the extended ROI regions marked by gray color.

ROI Scoreboard for FMOTo create a scoreboard of ROI macroblocks, points are given to classify the category of each macroblock.If a macroblock located in the background gets two points. If a macroblock belongs to an extended region either in spatial or temporal domains, it gets one point. Otherwise, a macroblock obtains zero point when it belongs to the ROI region.As illustrated in Fig. 16(h), each macroblock has its score from the lookup table in Table IV, and then it is arranged into five distinct ordered slices. Fig. 16(i) shows the original frame with the result of ROI scoreboard in Fig. 16(h) to demonstrate the location of the corresponding slices in a frame.

The higher the score, the less important a macroblock is in a frame.Corresponding score lookup tableROI Scoreboard for FMO


Experimental ResultsSalesman introduces a product with his hands

(a) (b) (c) (d) (e) (f) (g) (h)

The movement of the hands is the most important region in this sequence

Experimental Results (a) (b) (c) (d) (e) (f) (g) (h)

ROIs of Foreman sequence.


Two walking taff in the office room.


Time Consuming Analysis of Visual Rhythm ROI Determination

Evaluated on 1.5 GHz Pentium-M laptop with 512 MB DDR RAMs

Implementation of H.264/AVC ROI Video CodingIndicate the importance of each slice in FMO Ii : the importance factor Ni : the number of macroblocks of the slice i n stands for the number of slices in a frametarget bits bppi B is the target bits used for the current frame and is estimated by the JM encoderQPi for the FMO a and b are recommended as 14 and 0.32

Implementation of H.264/AVC ROI Video Coding




ConclusionThis paper has presented a robust ROI determination method based on user attention models through visual rhythm analysis.It has been the investigation of the visual rhythm concept for analyzing video content to facilitate the ROI determination.Through visual rhythm, the proposed algorithm can determine the highest potential ROI area in a fast, simple, and robust way.

Future WorkAn FMO-aware ROI determination has been proposed for H.264/AVC video coding to enhance the quality of ROI regions. Based on the concept proposed in this paper, potential developments of integrated applications are found when the proposed scheme is combined with chrominance information analysis.

Thanks for listening.

robust region of interest determination based on user attention model through visual rhythm analysis

Technology