course project - deep-generative-models.github.io · goal of generator: generating image of a same...

GANimation on TensorLayerAND

Thoughts of a Better Train Dataset

朱峰要曙丽

18012211691901210768

Course Project

艺术学院软件与微电子学院

⚫ StarGAN can only generate a discrete number of expressions, determined by the content of the dataset.

⚫ This paper introduces a novel GAN conditioning scheme based on Action Units (AU) annotations, which describes in a continuous manifold the anatomical facial movements defining a human expression.

Review

⚫ Additionally, we propose a fully unsupervised strategy to train the model,that only requires images annotated with their activated AUs, and exploitAttention Mechanisms that make our network robust to changingbackgrounds and lighting conditions.

⚫ We leverage on the recent EmotioNet dataset, which consists of one million images of facial expressions (we use 200,000 of them) of emotion in the wild annotated with discrete AUs activations.

Framework

Step1. Input: Original ImageAnd Expression Tags Step4-1. Generate Again

Step2. Image Generation

Step3. Input Again: Synthesized Image, Original Expression Tags

Step5. Final Image

Step4-2. Discriminator: evaluate the quality of the generated image and its expression

Two Blocks：Generator：GA, GIDiscriminator：DI, Dy

Attention-based Generator


Attention mask A：the generator can focus exclusively on the pixels defining facial movements, leading to sharper and more realistic synthetic images

Color mask C：the generator does not need to render static elements

Goal：generatingimage of a same ID with given expression.

Discriminator

Goal: Evaluate the quality of the generated image (Real Photo? Given ID?) and its expression (Given tag?)


Loss Function

Goal of Discriminator: Evaluate the quality of the generated image (Real Photo? Given ID?) and its expression (Given tag?)

Goal of Generator: Generating image of a same ID with given expression.

1 Image Adversarial Loss

2 Attention Loss

3 Conditional Expression Loss

4 Identity Loss

Full Loss

Gradient penalty

Implementation Details

Generator: built from CycleGAN; slightly modified by substituting the last convolutional layer withtwo parallel convolutional layers, one to regress the color mask C and the other to dene theattention mask A.

Discriminator: adopted the PatchGan architecture, with the gradient penalty computed withrespect to the entire batch.

Other Details: The model is trained on the EmotioNet dataset. We use a subset of 200,000samples (over 1 million) to reduce training time. We use Adam with learning rate of 0.0001, beta10.5, beta2 0.999 and batch size 25. We train for 30 epochs and linearly decay the rate to zero overthe last 10 epochs. Every 5 optimization steps of the critic network we perform a singleoptimization step of the generator. The model takes two days to train with a single GeForce GTX1080 Ti GPU.

EmotioNet (CVPR 2016) - Overview

SHAPE: Angles + Lengths Action Units SHADE: Gabor Filter

EmotioNet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild

Shoulder Pain DatabaseDenver Intensity of Spontaneous Facial Action (DISFA) datasetdatabase of Compound Facial Expressions of Emotion (CFEE)

AutomaticAnnotator

1 Million Picsfrom WordNet

EmotioNet

EmotioNet - 4 Limits

1 Lack of Various Directions2 No Occlusion

3 Inaccurate Annotation4 Discrete AU Intensity

Synthetic Data vs. EmotioNet

Body Pose Estimation

Object Detection/Recognition

Facial Landmark Localization

Lack of Various DirectionsNo OcclusionInaccurate AnnotationDiscrete AU Intensity

And Super Resolution, Frame Interpolation... etc.

Implementation - 2 Possible Methods

No.1 3D Reconstruction + Render

Various DirectionsOcclusionAccurate AnnotationAU IntensityData Production

YYNContinuousEfficient confidence AU01 AU02 AU04 AU05 AU06 AU07 AU09 AU10

0.875 0.11 0.27 0 0.33 0 0 0 0AU12 AU14 AU15 AU17 AU20 AU23 AU25 AU26 AU450 0 0 0.77 0 0.82 0 0 0

confidence AU01 AU02 AU04 AU05 AU06 AU07 AU09 AU10

0.975 0.09 0 0 0.18 0.25 0.6 0 0.37

AU12 AU14 AU15 AU17 AU20 AU23 AU25 AU26 AU45

0.53 0.57 0 0 0 0.28 0.95 0.32 0

confidence AU01 AU02 AU04 AU05 AU06 AU07 AU09

0.975 0.02 0.73 0.48 0.36 0 0 0.31

AU14 AU15 AU17 AU20 AU23 AU25 AU26 AU45

0 0.96 0.74 0.44 0.05 0 0.16 0

Implementation - 2 Possible Methods

No.2 Pure CG Project

Various DirectionsOcclusionAccurate AnnotationAU IntensityData Production

YYYContinuousInefficient

128*128*317

128*128*17

128*128*20

GANimation Architecture

GANimation on TensorLayer - Generator

GANimation on TensorLayer - Discriminator

Loss Function

Goal of Discriminator: Evaluate the quality of the generated image (Real Photo? Given ID?) and its expression (Given tag?)

Goal of Generator: Generating image of a same ID with given expression.

1 Image Adversarial Loss

2 Attention Loss

3 Conditional Expression Loss

4 Identity Loss

Full Loss

Gradient penalty

Loss of Generator

GANimation on TensorLayer - Loss

Loss of Discriminator

GANimation on TensorLayer - Loss

TrainBATCH_SIZE = 25EPOCHS = 30lambda_D_img = 1lambda_D_au = 4000lambda_D_gp = 10lambda_cyc = 10lambda_mask = 0.1lambda_mask_smooth = 1e-5if e <= 21:

lr_now = 1e-4else:

lr_now = 1e-5 * (EPOCHS + 1 - e)

GANimation on TensorLayer - Train

Results – Single AUs’Masks

Results – Multiple AUs

Thanks.

course project - deep-generative-models.github.io · goal of generator: generating image of a same...

Documents