prof. anil kokaram just to get this out of the waysigmedia/pmwiki/uploads/teaching.1... · vision :...

11
1 IEEE Trans IP Assoc Ed Prof. EEE Dept Trinity College Dublin With TCD since 1998 Digital Cinema Film/Video Post-production/Restoration Multimedia Information Retrieval Sports Informatics/Defect Analysis Lo-Bandwidth Comms VoW PostDoc, Developer, 2 Phd EU, TCD EU RTN, EI, IntColl PRTLI,IntColl Sigmedia Sigmedia Prof. Anil Prof. Anil Kokaram Kokaram [email protected] [email protected] www.sigmedia.tv www.sigmedia.tv MUSE-DTV: Machine Understanding of Sports Events CASMS/LastActionReplay: Content Aware Sports Media Streaming DysVideo: Using Video to diagnose Dyslexia (with Dept. Psychology TCD) Content analysis for visual presentations (with Trinidad) www.moumir.org PRESTOSPACE AXIOM Just to get this out of the way … Just to get this out of the way … To Dr. Bill Collis, Simon Robinson, Ben Kent and Dr. Anil Kokaram for the design and development of the Furnace integrated suite of software tools that robustly utilizes temporal coherence for enhancing visual effects in motion picture sequences. The Furnace toolset's modularity, flexibility and robustness have set a high standard of quality for optical flow-based image manipulation. www.sigmedia.tv www.foundry.co.uk 2007 academy award What was it like? What was it like? Then back to Dub in Feb! Trinidad WHAT DOES THIS HAVE TO DO WITH 1E8? 1E8 Introduction to Electrical Engineering Image and Video Processing Or Electronics is not all about Circuits Dr. Anil Kokaram www.sigmedia.tv www.mee.tcd.ie/~ack [not a good page]-> Teaching www.sigmedia.tv [MY GROUP’S site] [email protected] You will learn about Resistors, Inductors, Capacitors in 1e6 and electric circuit design in 1E7 [do NOT miss those labs] But electronics is more than circuit design This course in a way shows you that Engineering is more about problem solving than one particular discipline 1E8 Introduction to Electrical Engineering Image and Video Processing Or Electronics is not all about Circuits Dr. Anil Kokaram www.sigmedia.tv 1E8 is also about opening your horizons to the changing world of EEE In my first year at university I wanted to do Mechanical Engineering Thought electronics was too tricky But then came the CD …. [1986] …. electronics changed forever … I changed my mind because of lectures like these

Upload: others

Post on 04-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Prof. Anil Kokaram Just to get this out of the waysigmedia/pmwiki/uploads/Teaching.1... · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses

1

IEEE Trans IP Assoc EdProf. EEE Dept Trinity College DublinWith TCD since 1998

Digital CinemaFilm/Video Post-production/Restoration

Multimedia Information RetrievalSports Informatics/Defect Analysis

Lo-Bandwidth CommsVoW

PostDoc, Developer, 2 Phd

EU, TCD

EU RTN, EI, IntColl PRTLI,IntColl

SigmediaSigmediaProf. Anil Prof. Anil KokaramKokaram

[email protected]@tcd.ie

www.sigmedia.tvwww.sigmedia.tv

MUSE-DTV: Machine Understanding of Sports Events CASMS/LastActionReplay: Content Aware Sports Media StreamingDysVideo: Using Video to diagnose Dyslexia (with Dept. Psychology TCD)Content analysis for visual presentations (with Trinidad)

www.moumir.org

PRESTOSPACE

AXIOM

Just to get this out of the way …Just to get this out of the way …

To Dr. Bill Collis, Simon Robinson, Ben Kent and Dr. Anil Kokaram for the design and development of the Furnace integrated suite of software tools

that robustly utilizes temporal coherence for enhancing visual effects in motion picture sequences.

The Furnace toolset's modularity, flexibility and robustness have set a high standard of quality for optical flow-based image manipulation.

www.sigmedia.tv

www.foundry.co.uk

2007 academy award

What was it like?What was it like?

Then back to Dub in Feb!

Trinidad

WHAT DOES THIS HAVE TO DO WITH 1E8?

1E8 Introduction to Electrical EngineeringImage and Video ProcessingOr Electronics is not all about CircuitsDr. Anil Kokaram www.sigmedia.tv

www.mee.tcd.ie/~ack [not a good page]-> Teachingwww.sigmedia.tv [MY GROUP’S site][email protected] will learn about Resistors, Inductors, Capacitors in 1e6 and electric circuit design in 1E7 [do NOT miss those labs]But electronics is more than circuit designThis course in a way shows you that Engineering is more about problem solving than one particular discipline

1E8 Introduction to Electrical EngineeringImage and Video ProcessingOr Electronics is not all about CircuitsDr. Anil Kokaram www.sigmedia.tv

1E8 is also about opening your horizons to the changing world of EEEIn my first year at university I wanted to do Mechanical EngineeringThought electronics was too trickyBut then came the CD …. [1986] …. electronics changed forever …I changed my mind because of lectures like these

Page 2: Prof. Anil Kokaram Just to get this out of the waysigmedia/pmwiki/uploads/Teaching.1... · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses

2

EEEvolutionEEEvolution ContentContent

►► Introduction to Image and Video Introduction to Image and Video ProcessingProcessing

►► Human Visual PerceptionHuman Visual Perception►► Cleaning Dirty Pictures (Motion Picture Cleaning Dirty Pictures (Motion Picture

Restoration)Restoration)►► Image and Video CompressionImage and Video Compression►► Digital Compositing in the MoviesDigital Compositing in the Movies

What we’re really doing …What we’re really doing …

►►How does DVD/DTV work?How does DVD/DTV work?►►What the hell is MPEG and JPEG exactlyWhat the hell is MPEG and JPEG exactly►►What is Digital Cinema? Who cares?What is Digital Cinema? Who cares?►►What is digital compositing and how do they What is digital compositing and how do they

use it in the movies?use it in the movies?►►What do people doing research actually do? What do people doing research actually do?

And what have they done lately anyway? And what have they done lately anyway? Who cares?Who cares?

The Digital Image

The basic idea is to represent the continuous coloursin a real image with a set of numbers from a fixed range at a subset of locationsThe smallest element of a digital picture is called a Pixel (Picture Element)Typically, a television image is created with 576 lines of 720 pixels eachA film frame is represented with over 2048 lines of 2880 pixels each

The Digital Image

101 109 11099 90 94

112 123 108

123 131 141121 112 118134 145 132

38 46 6575 66 8688 99 100

r

g

b

The Digital Imageh

k

I([h,k])

0

I([50,50])=[40 70 200]

Page 3: Prof. Anil Kokaram Just to get this out of the waysigmedia/pmwiki/uploads/Teaching.1... · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses

3

The Rise of Digital Visual MediaThe Rise of Digital Visual Media

Began in early 1960’s with NASARise of DIP coincides with availability of good picture reproduction/printing. [Why do DIP if you can’t see the results?]Last 5 years has seen exponential increase in Dvisualdevices. DV cameras, D camerasDTV since 1999. Free SKY in IrelandDigital Video (Versatile) Disc allows movies to be played on a CD-like deviceInternet streaming video, Real Networks, PacketVideo (video for mobile phones), PDAs

Complex systemsComplex systems

►► Devices and media require increasingly complex Devices and media require increasingly complex system designsystem design

►► Image and Video compression one of the key Image and Video compression one of the key technologies that enabled consumer devices like D technologies that enabled consumer devices like D CamerasCameras

►► Designer needs to understand compromises to be Designer needs to understand compromises to be made in handling visual mediamade in handling visual media

►► ALOT of electronic circuit design is about making ALOT of electronic circuit design is about making hardware for Digital Video Processing in Mobile hardware for Digital Video Processing in Mobile Phones and HD Sets these daysPhones and HD Sets these days

Complex systemsComplex systems Complex systemsComplex systems

High Level Design

ImplementationHardware

SoftwareCo-design

Ipod, HDTV, Mobile, VoIP

Motivation: Impact of Digital MediaMotivation: Impact of Digital Media

TV Services merging with Internet? DVD Replacing VHS. Digital Television works.Watching TV on your PCGames consoles = TV = PC = internet access = DVD player [Sony PlayStation II/III]Cheap, high quality broadcastingEquality. Small producers can reach the same audience as large conglomeratesSo many “channels” but nothing on!!

SO MANY WAYS TO WATCH MOVIES AND VIDEO!

Motivation: Impact of Digital MediaMotivation: Impact of Digital Media

Digital TV broadcasters cannot find contentMobile operators looking for more interesting content for their phone users e.g. through WAP or iMODECompelling content in demandContent creators = producers of movies, editors of live events, character generatorsArchives more importantTools for Digital Movie making more important

WHO’S MAKING NEW STUFF TO WATCH?

Page 4: Prof. Anil Kokaram Just to get this out of the waysigmedia/pmwiki/uploads/Teaching.1... · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses

4

ENTERTAINMENTIS THE DRIVER OF MODERN TECHNOLOGY

IN THE 21ST CENTURY

UP TO THE START OF THE 1990’STHE COLD WAR WAS THE TECHNOLOGY DRIVER

Motivation: Motivation: Automated RestorationAutomated Restoration

Archive material in demandDVD re-release importantPictures in bad shapeNeed Automated restorationA challengeCompression is improved (Bonus!) Jitter

Ghosting

Dropout

Dirt and Sparkle

DIY De-Blotching(To get you thinking …)

Detect then InterpolateArea occluded in next frame

Area uncoveredfrom the previous frame

n-1

n

n+1

A simple exampleA simple example

Some dirty pictures

n-1

n

n+1

Simple Detection

Hmmm … not so good. Too many false alarms due to bad motion

Page 5: Prof. Anil Kokaram Just to get this out of the waysigmedia/pmwiki/uploads/Teaching.1... · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses

5

A Model for Blotches(How to make Dirty Pictures)

x

x

+Original

Corruption Data Location of Corruption

Location of OK pixels

Better Detection

Use more knowledge: Blotches are flat and chunky

What about reconstructing the Picture?

n n+1 n+2n-1n-2

Motion is key for Picture Building

n n+1 n+2n-1n-2

But we don’t know the motion

n n+1 n+2n-1n-2

Trick is to unify all sources of information

Motion

Picture

Blotches

Motion Smoothness

Picture Smoothness

Spatial Smoothness

Blotch Colour

Needs probabilistic framework. Need to know about probability and statistics

Page 6: Prof. Anil Kokaram Just to get this out of the waysigmedia/pmwiki/uploads/Teaching.1... · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses

6

Motivation: Automated RestorationMotivation: Automated Restoration

Removing blotches, lines, noise

Motivation: Robust Video TransmissionMotivation: Robust Video Transmission

►► Video Video commscomms over over difficultdifficult channels: Wireless, channels: Wireless, InternetInternet

►► MPEG4 (Motion Picture Experts Group) partly MPEG4 (Motion Picture Experts Group) partly addresses thisaddresses this

►► BUT encoder techniques not definedBUT encoder techniques not defined►► AND a few errors can lead to big problems. AND a few errors can lead to big problems. Error Error

detection/correction/concealmentdetection/correction/concealment v. important.v. important.

Motivation: Robust Video TransmissionMotivation: Robust Video Transmission

►► MP4 Correctly Rec’dMP4 Correctly Rec’d MP4 with errors

Motivation: Robust Video TransmissionMotivation: Robust Video Transmission

►► MP4 MP4 with errorswith errors Corrected with video processing

Motivation: Digital Special Motivation: Digital Special FxFx (Rig Removal)(Rig Removal)

www.mee.tcd.ie/~sigmedia/postpro/postproSee paper on www.mee.tcd.ie/~sigmedia

Tool being used in movies now.Produced by www.thefoundry.co.uk Motivation: Motivation: Information Information

management/Retrieval [DVD Example]management/Retrieval [DVD Example]

►► Somebody has to generate the DVD IndexSomebody has to generate the DVD Index►► Use Storyboard (sometimes not available)Use Storyboard (sometimes not available)►► But need FRAME accurate location (24 or 25 But need FRAME accurate location (24 or 25

frames per second)frames per second)►► Editors do this all the time e.g. news, live Editors do this all the time e.g. news, live

events. But have to watch the whole event.events. But have to watch the whole event.►► Painful to do by hand; possible but painfulPainful to do by hand; possible but painful

Page 7: Prof. Anil Kokaram Just to get this out of the waysigmedia/pmwiki/uploads/Teaching.1... · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses

7

Motivation: Motivation: Information Retrieval Information Retrieval Automated Storyboarding? [DVD]Automated Storyboarding? [DVD]

►► Want to automatically segment the movie Want to automatically segment the movie into into SematicallySematically Meaningful Scenes/ChaptersMeaningful Scenes/Chapters

►► Hard to make a signal processing algorithm Hard to make a signal processing algorithm which understands movies (don’t even try…)which understands movies (don’t even try…)

►► The basic building block is the SHOTThe basic building block is the SHOT►► Can detect each Shot automatically by Can detect each Shot automatically by

detecting cutsdetecting cuts

Motivation: Motivation: Information Retrieval Information Retrieval Automated Storyboarding? [DVD]Automated Storyboarding? [DVD]

Shot 1 Shot 2 Shot 3 Shot 4

Cut 1 Cut 2 Cut 3

Motivation: Motivation: Shot Cut DetectionShot Cut DetectionYour first Image Processing AlgorithmYour first Image Processing Algorithm

►►How to detect cuts automatically?How to detect cuts automatically?►►Cuts = consecutive images that show Cuts = consecutive images that show

drastic change between shotsdrastic change between shots►►Need to define cuts MathematicallyNeed to define cuts Mathematically►►Remember Digital Images are Remember Digital Images are

composed of Pixelscomposed of Pixels►►Each pixel is associated with 3 numbersEach pixel is associated with 3 numbers►►Any ideas? (see next slide for reminder)Any ideas? (see next slide for reminder)

The Digital Image (a reminder)

h

k

I([h,k]) 0

I([50,50])=[40 70 200]

A Video sequence is just a bunch of frames recorded at regular intervals in time.

Frame 1

Frame 2

Frame 3

)(xI

Using position vector notation

Motivation: Motivation: Shot Cut DetectionShot Cut DetectionYour first Image Processing AlgorithmYour first Image Processing Algorithm

∑ −−=x

xx |)()(| 1nnn IIe

0 5 0 0 1 0 0 0 1 5 0 00

1 0

2 0

3 0

Frame Number

Motivation: Motivation: Shot Cut DetectionShot Cut DetectionYour first Image Processing AlgorithmYour first Image Processing Algorithm

►►Frame Difference not so goodFrame Difference not so good►►Histograms better (will show you what Histograms better (will show you what

these are later)these are later)

Page 8: Prof. Anil Kokaram Just to get this out of the waysigmedia/pmwiki/uploads/Teaching.1... · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses

8

Motivation: Motivation: Information Information management/Retrievalmanagement/Retrieval

►► Indexing is part of a BIGGER problemIndexing is part of a BIGGER problem►► Want to access digital media in a way relevant to Want to access digital media in a way relevant to

usersusers►► You are familiar with text searching on Internet You are familiar with text searching on Internet

e.g. AltaVista, Google sitese.g. AltaVista, Google sites►► You want to look something up: just type in a You want to look something up: just type in a

keyword combinationkeyword combination►► Works pretty good Works pretty good if your keywords existif your keywords exist in the in the

documents to be searched i.e. text is OKdocuments to be searched i.e. text is OK

►► Picture archives like Picture archives like www.bridgeman.co.ukwww.bridgeman.co.uk►► Book publisher wants a picture from the Book publisher wants a picture from the ManetManet

collection with predominantly red shades, and trees collection with predominantly red shades, and trees in the background. HUH?in the background. HUH?

►► Nobody thought of Nobody thought of keywordingkeywording pictures for pictures for predominant colour when it was first put into the predominant colour when it was first put into the database !!!!database !!!!

►► So ask the 4 people So ask the 4 people in cataloguingin cataloguingif they can REMEMBER seeing a picture like that ??if they can REMEMBER seeing a picture like that ??

Motivation: Motivation: Information Retrieval ExampleInformation Retrieval Example

Motivation: Motivation: Information Retrieval PossibleInformation Retrieval Possible

►► Can solve these problems by using Can solve these problems by using SignalSignal/Image/Image ProcessingProcessing

►► Analyse the data itself Analyse the data itself AS THE NEW QUERY AS THE NEW QUERY OCCURSOCCURS

►► Predominant Colour : easy to automate: Predominant Colour : easy to automate: remember RGB pixels?remember RGB pixels?

►► Spotting famous people in video: harder but Spotting famous people in video: harder but possiblepossible

►► Biometrics ….Biometrics ….

Image Moments Image Moments for Sports Parsingfor Sports Parsing

Content Based Analysis for Video from Snooker Broadcasts, H. Denman, N. Rea and A. Kokaram, International Conference on Image and Video Retrieval (CIVR), 2002, July, London,UK

Content Retrieval for Snooker at www.mee.tcd.ie/~sigmedia

Rea, DenmanKokaram

Tracking with a particle filterUsing histograms for matching

Content Based Analysis for Video from Snooker Broadcasts, H. Denman, N. Rea and A. Kokaram, to be published in Computer Vision and Image Understanding Journal (CVIU): Special Issue on Video Retrieval and Summarization

Multimodal Fusion (?) www.mee.tcd.ie/~sigmedia

Dahyot, Rea, Denman,Delacourt, Kokaram

Joint Audio Visual Retrieval for Tennis Broadcasts, R. Dahyot, A. Kokaram, N. Rea and H. Denman, International Conference on Acoustics, Speech, and Signal Processing, 2003

Page 9: Prof. Anil Kokaram Just to get this out of the waysigmedia/pmwiki/uploads/Teaching.1... · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses

9

Conferences/Journals/etcConferences/Journals/etc

►► ICASSP, ICIP (Research/Industrial)ICASSP, ICIP (Research/Industrial)►► NAB, IBC (Industrial)NAB, IBC (Industrial)►► www.citeseer.orgwww.citeseer.org►► IEEE Trans IP, IEEE Trans IP, CctsCcts Sys Video Tech, PAMI, SP, Sys Video Tech, PAMI, SP,

SPL, Systems & Cybernetics [H/M Interfacing, SPL, Systems & Cybernetics [H/M Interfacing, Sensors etc]Sensors etc]

►► EURASIP SP, Image EURASIP SP, Image CommsComms..►► SIGGRAPH: Cool conference for movie special SIGGRAPH: Cool conference for movie special fxfx

industryindustry►► www.howstuffworks.comwww.howstuffworks.com

OverviewOverview

►► Image and Video Processing useful for more than Image and Video Processing useful for more than justjust compressioncompression

►► Many interesting new areas driven by availability Many interesting new areas driven by availability of new devicesof new devices

►► Information Retrieval from Digital Media requires Information Retrieval from Digital Media requires even more understanding of Image and Video even more understanding of Image and Video analysis analysis [MPEG7][MPEG7]

►► Lets have a look at some background …Lets have a look at some background …

Sampling an audio signalSampling an audio signal

►► Original CD Audio 44.1 KHzOriginal CD Audio 44.1 KHz►► DownsampledDownsampled by 4 = 11.02 KHz (no antiby 4 = 11.02 KHz (no anti--aliasing)aliasing)►► DownsampledDownsampled by 8 = 5.5 KHz (no antiby 8 = 5.5 KHz (no anti--aliasing)aliasing)►► DownsampledDownsampled by 8 = 5.5 KHz (with antiby 8 = 5.5 KHz (with anti--aliasing)aliasing)►► Sampling frequency affects the position of the Sampling frequency affects the position of the

samples in time hence affects frequency content samples in time hence affects frequency content of signal (of signal (sortofsortof))

Quantisation of the samples to Quantisation of the samples to make a digital signalmake a digital signal

►► Original CD Audio 16 bit 44.1 KHz (65536 levels)Original CD Audio 16 bit 44.1 KHz (65536 levels)►► 8 bit Quantization (256 levels)8 bit Quantization (256 levels)►► 4 bit Quantisation (16 Levels)4 bit Quantisation (16 Levels)►► 2 bit Quantisation (4 Levels)2 bit Quantisation (4 Levels)►► Quantisation introduces NOISE into the digital Quantisation introduces NOISE into the digital

signal because the accuracy of the digital samples signal because the accuracy of the digital samples as compared to the analogue signal is affectedas compared to the analogue signal is affected

Vision : The Human Visual Vision : The Human Visual System (HVS)System (HVS)

Light is focussed onto the retinaElectrical Impulses from the retina are

chanelled by the optic nerve to the Visual CortexThe Visual Cortex does a whole bunch

of smart things including filtering, object recognition, edge detection.In `primitive animals’ A LOT of

processing happens just behind the retina. Frogs and Rabbits have TEMPLATES for spotting birds of prey.Our motion sensitivity is better at the

periphery of vision than at the centre. [Helps to avoid people sneaking up on you.]

Lens

PupilRetina

Optic NerveBlind Spot

(hence the usual card trick ..)

Intensity Sensitivity of HVSIntensity Sensitivity of HVS

next

Page 10: Prof. Anil Kokaram Just to get this out of the waysigmedia/pmwiki/uploads/Teaching.1... · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses

10

Foreground 167 Foreground 140

Objects appear to have similar Objects appear to have similar brightnessbrightness

latex

Frequency SensitivityFrequency Sensitivity

Grating increases in freq. Left to RightIntensity decreases verticallySensitivity given by j.n.d. junction latex

140 160 180 200 220 24020

30

40

50

60

70

80

90

100

110

120

Gre

ysca

le (

-) a

nd S

ensi

tivity

(--)

C o lumn

50 100 150 200 250 300 350 400 450 500

Spatial Freq. ResponseSpatial Freq. ResponseMach BandingMach Banding

latex

Meaning of Spatial FrequencyMeaning of Spatial Frequency

h

5h

tan(h/(5h)) = tan(1/5)-1-1

768 pels = 11.3 degrees

Monitor Display

CCIR rec 500

latex

Vision: Your colour sensitivity isn’t great Vision: Your colour sensitivity isn’t great cfcf intensity intensity

Rods are active at low light levelsCones allow you to see colourThere are 100 Million Rods and 7

Million Cones in your retinaHence you SAMPLE luminance space A

LOT more finely (at a higher frequency) than COLOUR SPACEThe cells are arranged in a hexagonal

pattern .. Hence some suspect that hexagonal arrangements of light sensitive ccts in cameras is a good idea. Better than rectangular anyway.Fuji cameras claim to use hex grids,

while others use normal grids at higher densities of elements

Retina

Rod Cone

Original at full colour resolution

Consequences of Colour SensitivityConsequences of Colour Sensitivity

512 x 512 x 3

= 0.64 MB

Original Image

Page 11: Prof. Anil Kokaram Just to get this out of the waysigmedia/pmwiki/uploads/Teaching.1... · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses

11

Subsampling Colour Planes

2:1 in bothdirections

Keep Discard

4:1 Colour

Downsampling

OK

512 x 512 + 256 x 256 x 2 = 0.31 MB (1/2 bandwidth of original)

16:1 Colour

Downsampling

Still OK

512 x 512 + 128 x 128 x 2 = 0.24 MB (1/3 bandwidth of original)

16:1 Luminance

Downsampling

Not good

128 x 128 x 3 = 0.04 MB (1/16 bandwidth of original)Latex

A Noisy PictureA Noisy PictureActivity Masking and Weber’s LawActivity Masking and Weber’s Law

Noise hard to see in Textured areas, easy in flat areasNoise harder to see in Bright Areas than dark Areas

Latex

I(h,k) I(h,k)+e(h,k)