Keras – How to Manipulate Multiple Loss Functions in Keras: Practical Guide and Tips

keraspythontensorflow

Lets assume that we have a model model_A and we want to build up a backpropagation based on 3 different loss functions. The first loss (Loss_1) should be based on the output of model_A, Loss_2 and Loss_3 can come from something else. Think about it like a deviation from an unknown source, like in process-automation if you want to build up ur PID-controller. The easiest way is my approach down there, but it actually fails, because the graph isnt constructed the way i want, because X_realB and X_realC have no connection to model_A, and are ignored by keras.

Any ideas how i could use additional loss functions, without passing the values through model_A, but stil influencing the minimization problem?

def generator_model(model_A):

  model_A.trainable = True

# import
  X_realA = Input(shape=image_shape)
  X_realB = Input(shape=image_shape)
  X_realC = Input(shape=image_shape)

# generate Fake image
  Fake_A=model_A(X_realA)


  model = Model([X_realA],[Fake_A,X_realB ,X_realC])

  opt = Adam(lr=0.0002, beta_1=0.5)
  model.compile(loss=["mse","mse","mse"],loss_weights=[1,1,1], optimizer=opt)
  model.summary()
  return model

And as a second question: Is there a way, to use not differeniable elements in custom keras layers, like tf.unique (counting elements in tensors), between two models like:

# import
  X_realA = Input(shape=image_shape)

# generate Fake image
  Fake_A=model_A(X_realA)

  # counting the elements and reshape the tensor
  _,_,counts = keras.layers.Lambda(lambda x: tf.unique_with_counts(x))(Fake_A)
  new_Fake_A= keras.layers.Lambda(lambda x: tf.reshape(x,(something,something)))(counts)

  Fake_B=model_B(new_Fake_A)  

  model = Model([X_realA],[Fake_A,Fake_B])

But with this approach, the model is not working properly and isnt updating the weigths of model_A. I thougth maybe because tf.unique_count produces new tensors, which have no connection to the old ones, and there also are no gradients, but for that is the lambda.layer anyway. Any ideas how to tackle that problem?

Best Answer

Try constructing your model like so:

model = Model([X_realA, X_realB, X_realC], [Fake_A, X_realB , X_realC])

I have a hunch your code should work this way. However if you want to update modelA using some calculated loss from X_realB and X_realC that is not going to work. You see when you define the losses ["mse", "mse", "mse"] that means three different losses are calculated and then the nodes that contribute to that loss (/output) are updated by backpropagating. Your modelA network does not contribute to the losses calculated from X_realB, X_realC.

If you want to update modelA, I would recommend implementing a custom loss function, where additional losses are added to the loss calculated from your Fake_A output. If I understand you correctly, you have a model output, and some additional information about the environment the input measurement was taken in, and you want to use this additional information when calculating the loss from Fake_A. This is essentially additional information about the expected output, so I would put X_realB and X_realC into the annotation and handle it in the custom loss.

If you can provide more information about your use case maybe I can be of more help.


Edit 1:

In combined_loss you are adding constants to the loss calculated from Fake_A, so when taking the derivatives wrt. model parameters they zero out. This comes from the linearity of differentiation, where differentiating a summation is differentiating by parts. To put it simply in your case:

deriv_wrt_params(loss+12+34) = deriv_wrt_params(loss) + deriv_wrt_params(12) + deriv_wrt_params(34) = deriv_wrt_params(loss) + 0 + 0

Also because you are using MSE, your generator will learn to output only ones, since you are punishing values deviating from one:

loss0=keras.losses.mse(FakeA,FakeA_ones)

I recommend using binary crossentropy.
If these added values are not related to the traditional identity loss, generator loss and consistency loss, but come from a prior knowledge, you should use eg. multiplication or something like that so they affect the gradients as well, not just the loss.

If you want to implement CycleGAN with identity loss, consistency loss etc. you will have to implement a custom train loop to update the generators and discriminators separately. For this I recommend the official Tensorflow 2.1 CycleGAN tutorial, where they implement a CycleGAN from start to finish.