unwrap mosaics: a new representation for video editing alex rav-acha et al. in siggraph 2008 발표...

Unwrap Mosaics: A new representation for video editing

Alex Rav-Acha et al.In SIGGRAPH 2008

발표 이성호2009 년 1 월 22 일

2

Abstract

• A new representation for video• Modeling the image-formation process

– From an object’s texture map to the image• Modulated by an object-space occlusion mask

– Recover “unwrap mosaic”

• Editing• Re-composition

– Into the original space– Resizing objects– Repainting textures– Copying/cutting/pasting objects– Attaching effects layers to deforming objects

3

Introduction

• Unwrap mosaics– A wide range of Editing– Deforming surface– Self-occlusion– Provides references for feature-point

tracking

4

Reconstructiong 3D model

• Not easy• Software packages– [2d3 Ltd. 2008; Thorm¨ahlen and

Broszio 2008]

• Extensions to nonrigid scenes– [Bregler et al. 2000; Brand 2001;

Torresani et al. 2008]

• Less reliable under occlusion

5

Dense reconstruction from video

• Restricted to rigid scenes – [Seitz et al. 2006]

• Using interactive tools– [Debevec et al. 1996; van den Hengel et al. 2007]– Also for rigid scenes

• Triangulation of the sparse points from nonrigid structure– Surprisingly troublesome

6

Unwrap mosaics

• Ro recover the object’s texture map– Rather than its 3D shape– Directly from video

• recovered texture map will be a– 2D-to-2D mapping– Sequence of binary masks modeling oc-

clusion– Unwrap mosaic

7

• A video will typically be represented– by an assembly of several unwrap mosaics

• one per object,• and one for the background.

• Edits– performed on mosaic itself– Without converting to 3D

• Main contribution– Recover the unwrap mosaic from images

• Energy minimization procedure

8

• Figure 2: Reconstruction overview. Steps in constructing an unwrap mosaic rep-resentation of a video sequence. Steps 1 to 3a form an initial estimate of the model parameters, and step 3 is an energy minimization procedure which re-fines the model to subpixel accuracy.

9

Segmentation

• Segment the sequence into – independently moving objects

• “video cut and paste” [Li et al. 2005]

• Allow for user interaction

10

Tracking

• Recover the texturemap of – a deforming 3D object – from a sequence of 2D images

• texture map may be assumed to be con-stant– although the model is changing its shape

• Interestpoint detection and tracking – [Sand and Teller 2006, for example]

11

Embedding

• view the sparse point tracks – as a high dimensional projection of the

2D surface parameters

12

Mosaic stitching

• a map from the tracked points in each image– to the (u; v) parameter space.

• A variation of [Agarwala et al. 2004]– emerges naturally from the energy formulation.

13

Track refinement

• the mosaic is good enough – to create a reference template to match• against the original frames

• reduces any drift that may have been present after the

• original tracking phase

14

Using the model for video edit-ing

• Edit the texture map– For example by drawing on it– Warp it via the recovered mapping– Combine with the other layers of the original sequence

• Re-rendered mosaic will not exactly match the original sequence– warped by the 2D–2D mapping– masked by the occlusion masks– alpha-blended with the original image

• Remove layers– The removed area is filled in

• because the mapping is defined even in occluded areas.

15

Limitations

• Textured surfaces are required for point tracking– Low texture– One dimensional textures– motion blur

• The assumption of a smoothly varying smooth 3D sur-face– objects with significant protrusions

• the dinosaur in figure 11

• The assumption of smoothly varying lighting– strong shadows will disrupt tracking

• limited to disc-topology objects– rotating cylinder will be reconstructed as a long tapestry

• see figure 11

16

The unwrap mosaic model

• Image generation model– how an image sequence is constructed

• from a collection of unwrap mosaics

– a fitting problem

• Energy minimization– via nonlinear optimization

• Initial estimate for the– 2D-2D mapping– Texture map– Can be obtained from sparse 2D tracking data.

19

Point-spread functions

25

Discrete energy formula-tion

27

Data cost

29

Constraints

30

Mapping smoothness

31

Visibility smoothness

32

Minimizing the energy

33

Minimizing over C: stitch-ing

36

Reparametrization and embed-ding

39

Minimizing over w: dense mapping

• Simply use MATLAB’s griddata to in-terpolate– No guarantee of minimizing the original

energy

40

Minimizing over w and b: dense mapping with occlusion

42

Lighting

43

User interaction and hinting

• Segmentatinon– the user selects objects in a small num-

ber of frames, – and the segmentation is propagated us-

ing optical flow or sparse tracks.

• mosaic coverage

45

Tuning parameters

• The robust kernel width τ – is set to match an estimate of image noise.– Set to 5/255 gray-levels

• Scale parameter τ3– 40 pixels

• Except the face, which had many outlier tracks

• spatial smoothness λwl– controls the amount of deformation of the map-

ping– Constant for all

• but the “boy” sequence

46

Results

• Synthetic sequence

47

Synthetic sequence

• There is no concept of a “ground truth”– visually evaluate the recovered mosaic

(figure 7)

• About 30% of mosaic pixels visible – in any one frame

• Self occlusion near the nose

48

Face sequence

• Few high-contrast points coupled – with strong lighting

49

Giraffe sequence: fore-ground

• A logo – is placed on the fore-

ground giraffe’s back and head

• With optical flow,– the annotation drifts

by about 10 pixels in 30 frames, while the

• Unwrap mosaic – shows no visible drift.

50

Boy sequence

– both sides of his head and torso are shown

– variable focus– motion blur– considerable fore-

ground occlusion

• Ear is doubled– could be fixed

• by editing as in section 4

51

Dinosaur sequence

• Not a smooth reparametrization – of the object’s true “texture map”

• iterative automatic segmentation – Might separate mosaics

• for each model component: arms, torso, tail

• Unwrap mosaic technique is – deformation agnostic

52

Related work

• Wang and Adelson’s paper [1994]– about how layers might apply to self-occludng ob-

jects

• Irani et al. [1995]– more elaborate parametric transformations

• Layered Depth Images [Shade et al. 1998]• Still photos to mosaics [Bhat et al. 2007]

– for rigid scenes

• Optic flow– energy-based formulation [Bruhn et al. 2005]

53

Related work

• Point tracking– Sand and Teller [2006]

• Fitting motion models – corresponding to multiple 3D motions

• Bhat et al. [2007]

• Discover texture coordinates – for predefined 3D surface models

• Zigelman et al. [2002]

• Use of energy formulations– to regularize models – and to make modelling assumptions coherent and ex-

plicit• [Fleet et al. 2002, Brox et al. 2004, Bruhn et al. 2005]

54

Discussion

• Without 3D shape recovery– manipulations of the video

• which respect the three-dimensionaliy of the scene

• Viewing reconstruction – as an embedding of tracks into 2D

• Embedding would be unstable– For more constrained models– Large amounts of missing data

• that self-occlusion causes

• We use approximate algorithms– No guarantees of globally minimizing the overall en-

ergy

55

Generalizations

• Automatically segment the layers– Many existing methods could be used– Sparse point tracks are clustered into independent

motions

• Optimizing each layer simultaneously• To allow non-boolean masks b• Apply matrix factorization to the input tracks

– to recover a deformable 3D shape– can place constraints

• on the mapping which would allow outlier removal

– allow exact visibility to be computed

unwrap mosaics: a new representation for video editing alex rav-acha et al. in siggraph 2008 발표...

Documents