course project - deep-generative-models.github.io · goal of generator: generating image of a same...
TRANSCRIPT
GANimation on TensorLayerAND
Thoughts of a Better Train Dataset
朱峰要曙丽
18012211691901210768
Course Project
艺术学院软件与微电子学院
⚫ StarGAN can only generate a discrete number of expressions, determined by the content of the dataset.
⚫ This paper introduces a novel GAN conditioning scheme based on Action Units (AU) annotations, which describes in a continuous manifold the anatomical facial movements defining a human expression.
Review
⚫ Additionally, we propose a fully unsupervised strategy to train the model,that only requires images annotated with their activated AUs, and exploitAttention Mechanisms that make our network robust to changingbackgrounds and lighting conditions.
⚫ We leverage on the recent EmotioNet dataset, which consists of one million images of facial expressions (we use 200,000 of them) of emotion in the wild annotated with discrete AUs activations.
Framework
Step1. Input: Original ImageAnd Expression Tags Step4-1. Generate Again
Step2. Image Generation
Step3. Input Again: Synthesized Image, Original Expression Tags
Step5. Final Image
Step4-2. Discriminator: evaluate the quality of the generated image and its expression
Two Blocks:Generator:GA, GIDiscriminator:DI, Dy
Attention-based Generator
Two Blocks:Generator:GA, GIDiscriminator:DI, Dy
Attention mask A:the generator can focus exclusively on the pixels defining facial movements, leading to sharper and more realistic synthetic images
Color mask C:the generator does not need to render static elements
Goal:generatingimage of a same ID with given expression.
Discriminator
Goal: Evaluate the quality of the generated image (Real Photo? Given ID?) and its expression (Given tag?)
Two Blocks:Generator:GA, GIDiscriminator:DI, Dy
Loss Function
Goal of Discriminator: Evaluate the quality of the generated image (Real Photo? Given ID?) and its expression (Given tag?)
Goal of Generator: Generating image of a same ID with given expression.
1 Image Adversarial Loss
2 Attention Loss
3 Conditional Expression Loss
4 Identity Loss
Full Loss
Gradient penalty
Implementation Details
Generator: built from CycleGAN; slightly modified by substituting the last convolutional layer withtwo parallel convolutional layers, one to regress the color mask C and the other to dene theattention mask A.
Discriminator: adopted the PatchGan architecture, with the gradient penalty computed withrespect to the entire batch.
Other Details: The model is trained on the EmotioNet dataset. We use a subset of 200,000samples (over 1 million) to reduce training time. We use Adam with learning rate of 0.0001, beta10.5, beta2 0.999 and batch size 25. We train for 30 epochs and linearly decay the rate to zero overthe last 10 epochs. Every 5 optimization steps of the critic network we perform a singleoptimization step of the generator. The model takes two days to train with a single GeForce GTX1080 Ti GPU.
EmotioNet (CVPR 2016) - Overview
SHAPE: Angles + Lengths Action Units SHADE: Gabor Filter
EmotioNet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild
Shoulder Pain DatabaseDenver Intensity of Spontaneous Facial Action (DISFA) datasetdatabase of Compound Facial Expressions of Emotion (CFEE)
AutomaticAnnotator
1 Million Picsfrom WordNet
EmotioNet
EmotioNet - 4 Limits
1 Lack of Various Directions2 No Occlusion
3 Inaccurate Annotation4 Discrete AU Intensity
Synthetic Data vs. EmotioNet
Body Pose Estimation
Object Detection/Recognition
Facial Landmark Localization
Lack of Various DirectionsNo OcclusionInaccurate AnnotationDiscrete AU Intensity
And Super Resolution, Frame Interpolation... etc.
Implementation - 2 Possible Methods
No.1 3D Reconstruction + Render
Various DirectionsOcclusionAccurate AnnotationAU IntensityData Production
YYNContinuousEfficient confidence AU01 AU02 AU04 AU05 AU06 AU07 AU09 AU10
0.875 0.11 0.27 0 0.33 0 0 0 0AU12 AU14 AU15 AU17 AU20 AU23 AU25 AU26 AU450 0 0 0.77 0 0.82 0 0 0
confidence AU01 AU02 AU04 AU05 AU06 AU07 AU09 AU10
0.975 0.09 0 0 0.18 0.25 0.6 0 0.37
AU12 AU14 AU15 AU17 AU20 AU23 AU25 AU26 AU45
0.53 0.57 0 0 0 0.28 0.95 0.32 0
confidence AU01 AU02 AU04 AU05 AU06 AU07 AU09
0.975 0.02 0.73 0.48 0.36 0 0 0.31
AU14 AU15 AU17 AU20 AU23 AU25 AU26 AU45
0 0.96 0.74 0.44 0.05 0 0.16 0
Implementation - 2 Possible Methods
No.2 Pure CG Project
Various DirectionsOcclusionAccurate AnnotationAU IntensityData Production
YYYContinuousInefficient
128*128*317
128*128*17
128*128*20
GANimation Architecture
GANimation Architecture
GANimation on TensorLayer - Generator
GANimation Architecture
GANimation on TensorLayer - Discriminator
Loss Function
Goal of Discriminator: Evaluate the quality of the generated image (Real Photo? Given ID?) and its expression (Given tag?)
Goal of Generator: Generating image of a same ID with given expression.
1 Image Adversarial Loss
2 Attention Loss
3 Conditional Expression Loss
4 Identity Loss
Full Loss
Gradient penalty
Loss of Generator
GANimation on TensorLayer - Loss
Loss of Discriminator
GANimation on TensorLayer - Loss
TrainBATCH_SIZE = 25EPOCHS = 30lambda_D_img = 1lambda_D_au = 4000lambda_D_gp = 10lambda_cyc = 10lambda_mask = 0.1lambda_mask_smooth = 1e-5if e <= 21:
lr_now = 1e-4else:
lr_now = 1e-5 * (EPOCHS + 1 - e)
GANimation on TensorLayer - Train
Results – Single AUs’Masks
Results – Multiple AUs
Thanks.