MATLAB: Two Issues about MATLAB’s Official Example of GAN

Deep Learning Toolboxgan

I am tring MATLAB's official example of GAN (https://www.mathworks.com/help/deeplearning/examples/train-generative-adversarial-network.html). There are a couple of issues I want to ask.
  1. After setting a long epoch, say 5000, the code crashed after 3833 iterations–actually I think it is simply an arbitrary number of long iterations–with the following errors
Error using nnet.internal.cnn.dlnetwork/predict
(line 198)
Layer 'bn1': Invalid input data. The value of
'Variance' is invalid. Expected input to be
positive.
Error in dlnetwork/predict (line 205)
[varargout{1:nargout}] =
predict(net.PrivateNetwork, x,
layerIndices, layerOutputIndices);
Error in GAN_Test (line 143)
dlXGeneratedValidation = predict(dlnetGenerator,dlZValidation);
Also note that it does not happen only once, nut multiple times with an arbitrary long number of epochs. As per the error message, I think 'bn1' refers to
batchNormalizationLayer('Name','bn1')
in the generator. which takes the output from
imageInputLayer([1 1 numLatentInputs],'Normalization','none','Name','in')
transposedConv2dLayer(filterSize,8*numFilters,'Name','tconv1')
So I think that is one of the main failure modes for GAN is for the generator to collapse to a parameter setting where it always emits the same point after training the generator over many epochs– quoted from https://arxiv.org/pdf/1606.03498.pdf
I am therefor wondering if MATLAB may issue a warning or setting variance in the code of predict() function to be always positive, say, add an eps.
2. As I mentioned in another post, the ganLoss(…) function in fact appends a Sigmoid layer at the end of the discriminator, and the loss is calculated AFTER it.
function [lossGenerator, lossDiscriminator] = ganLoss(dlYPred,dlYPredGenerated)
% Calculate losses for the discriminator network.
lossGenerated = -mean(log(1-sigmoid(dlYPredGenerated)));
lossReal = -mean(log(sigmoid(dlYPred)));
% Combine the losses for the discriminator network.
lossDiscriminator = lossReal + lossGenerated;
% Calculate the loss for the generator network.
lossGenerator = -mean(log(sigmoid(dlYPredGenerated)));
end
And yet when dlgradient(…) is adopted later, it seems to get started from the last layers of the discriminator and the generator, respectively, as it is shown in the example code
% There is NO Sigmoid layer in either of the dlnet
% There is NO Sigmoid layer in either of the dlnet
gradientsGenerator = dlgradient(lossGenerator, dlnetGenerator.Learnables,'RetainData',true);
gradientsDiscriminator = dlgradient(lossDiscriminator, dlnetDiscriminator.Learnables);
I am therefore wondering if, as per the chain rule, the loss, shall be firstly subject to the derivative of Sigmoid before it is sent back to the discriminator anc the generator, respectively. Specifically,
% Pseudo code
Final_Loss = -mean(log(sigmoid(dlYPred)));
% For one input
Del(Final_Loss)/Del(dlYRead)
=Del(Final_Loss)/Del(log(sigmoid(dlYPred))) * Del(log(sigmoid(dlYPred)))/Del(dlYRead)
=-(1/sigmoid(dlYPred)) * sigmoid(dlYPred) *(1-sigmoid(dlYPred))
=sigmoid(dlYPred)
% So I reckon that the follwing should be calculated and the last two backpropagated
Loss_G2D = -mean(-sigmoid(dlYPredGen));
Loss_D2D = --mean(1-sigmoid(dlYPredReal));
Loss_D = Loss_D2D + Loss_G2D;
Loss_G = -mean(1-sigmoid(dlYPredGen));
Please do correct me if I am wrong, thanks.

Best Answer

Hi Theron,
Re: 2. As I mentioned in another post, the ganLoss(...) function in fact appends a Sigmoid layer at the end of the discriminator, and the loss is calculated AFTER it.
function [lossGenerator, lossDiscriminator] = ganLoss(dlYPred,dlYPredGenerated)
% Calculate losses for the discriminator network.

lossGenerated = -mean(log(1-sigmoid(dlYPredGenerated)));
lossReal = -mean(log(sigmoid(dlYPred)));
% Combine the losses for the discriminator network.

lossDiscriminator = lossReal + lossGenerated;
% Calculate the loss for the generator network.
lossGenerator = -mean(log(sigmoid(dlYPredGenerated)));
end
*** Yes, the sigmoid layer is not part of the Discriminator and hence the sigmoid function is applied before loss computation. The loss for Discriminator is based on Eq. 1 in the GAN paper: https://arxiv.org/pdf/1406.2661.pdf. For the Generator, the loss is based on log(D(G(z))) rather than log(1-D(G(z))) as suggested in the paper (paragraph before Figure 1).
Re: And yet when dlgradient(...) is adopted later, it seems to get started from the last layers of the discriminator and the generator, respectively, as it is shown in the example code
% There is NO Sigmoid layer in either of the dlnet
% There is NO Sigmoid layer in either of the dlnet
gradientsGenerator = dlgradient(lossGenerator, dlnetGenerator.Learnables,'RetainData',true);
gradientsDiscriminator = dlgradient(lossDiscriminator, dlnetDiscriminator.Learnables);
I am therefore wondering if, as per the chain rule, the loss, shall be firstly subject to the derivative of Sigmoid before it is sent back to the discriminator and the generator, respectively.
*** dlgradient calculates the gradient of output variable w.r.t a set of input variables. It backpropagates through all operations needed to produce the output. So:
gradientsDiscriminator = dlgradient(lossDiscriminator, dlnetDiscriminator.Learnables);
computes the gradient of lossDiscriminator w.r.t the Discriminator weights (the Learnables) by backpropagating through all operations that were used to calculate lossDiscriminator - this includes sigmoid as well as other operations such as mean/log:
% Calculate losses for the discriminator network.
lossGenerated = -mean(log(1-sigmoid(dlYPredGenerated)));
lossReal = -mean(log(sigmoid(dlYPred)));
% Combine the losses for the discriminator network.
lossDiscriminator = lossReal + lossGenerated;
Hope this helps,
Gautam
Related Question