mmai 2014 final
TRANSCRIPT
Naive Soul Guardian Bloody Scenes Detection with Deep Convolutional Neural Network
Naive Soul GuardianBloody Scenes Detection with Deep Convolutional Neural NetworkB99902080 R03944007
OutlineMotivationSystem OverviewConvolutional Neural NetworkFully-Convolutional NetPixelationExperimentFuture WorkReferenceDemo1
Schedule guidance
MotivationLots of videos contain bloody scenes, we want to protect kids from these inappropriate scenesOur system aims to detect and pixelate bloody scenes automatically
2
MotivationLots of videos contain bloody scenes, we want to protect kids from these inappropriate scenesOur system aims to detect and pixelate bloody scenes automatically
3
System Overview4
VideosFrames
Pixelated frames
Ignored framesPixelated videosDecodeEncode
01
Convolutional Neural NetworkFine-tune pre-trained CaffeNet(ImageNet)Human-labeled frames without bounding boxPredict decoded framesBackground(0) ignored framesBloody frame(1) fully-convolutional net5
Fully-Convolutional NetClassification for each 227 227 box with stride 32 on 451 x 451 imageGenerate a 8 x 8 classification mapInterpolate probabilities to obtain heat map6
Fully-Convolutional Net
PixelationResize heat map to frame sizeBase on heat map, blur frames by Gaussian filter 7
Experiment (I)Run on cml21Decoding/Encoding done by FFmpegDecoded frames as training/validation dataPos = Segments from Saw 1, 2, 3, 7, Final Destination 4, 5 + Crawled images from google imagesNeg = Segments from The Big Bang Theory S8E11 + Part of ILSVRC 2013 val/testRandom sample Pos : Neg = 2500 : 25008
Experiment (II)Classification Accuracy73.46%
9
Test time
Experiment (III)Time(sec) of Processing a video clip 10DecodingClassificationHeat mapPixelationEncodingAverage timeSaw6(139 frames,720x404)0.3441.1822.9972.430.020.99 sec/frameCWL(109 frames,1280x720)0.7936.95001.240.36 sec/frameFD5(121 frames,1024x576)0.4436.233.8728.720.810.58 sec/frame
Future WorkTrain our model with more diverse data to increase accuracy and reduce false-positiveAccelerate blurring and smooth boundariesImplement on surveillance camera for securityCombine shot detection and motion vector to reduce computation11
ReferenceCaffe | Deep Learning Frameworkhttp://caffe.berkeleyvision.org/Classifying ImageNet: the instant Caffe wayNet Surgery for a Fully-Convolutional ModelFFmpeghttps://www.ffmpeg.org/ImageNethttp://www.image-net.org/Tutorials by Hsinfu, Shiro, Jocelyn12
Demo13
Finally, I wanna play aQ & A game14