Best reconstruction loss for RGB images

conv-neural-networkimage processingloss-functionsmachine learning

Which loss works the best for pixel-wise RGB image (3, width, height)reconstruction loss?

It seems there are several options

  1. Regression way. The input image has dimension (3, width, height) with values [0,1]. Apply sigmoid to the last layer in the generative (CNN) layer such that the output image has pixel values [0,1] for each channel, and simply use pixel-wise L1 (or L2) loss.

  2. Multi-class cross entropy way – treat each channel as one of 255 classes. Apply softmax to the last layer and use cross entropy loss to predict the class label [0,255].

What is the de facto loss for rgb image reconstructions?

Best Answer

First option is sensible as it's the usual MAE/MSE and they're used as reconstruction loss in may other situations. You can also use cross entropy loss for $w\times h\times 3$ values.

I do not recommend your second option as the class labels destroys the ordinal relationship between the pixel values, i.e. $0<1\dots<255$.