Do I calculate the loss of a regression on the normalized or denormalized data

machine learningneural networksnormalizationregressionsums-of-squares

I worked on implementing a simple MLP network, which should guess a numerical value based on different values. Basically it's a regression task. This was a few months ago.

Everything worked fine, the result is how I expected and the training iterations work great. I used L2-Normalization method to put the values between 0 and 1.

However I had a thought yesterday: I currently calculate the loss by comparing the normalized value and normalized prediction with Mean Squared Error. I know it's not actually wrong, but is it optimal? Or should I calculate the MSE after I denormalized the data again? I know I could graph both of these values, but I need one for the backpropagation.

I won't provide code here, since I feel like this is a general question. In the tutorials I've seen the data is not denormalized. It's been a while since I've worked on the MLP, maybe the idea of denormalizing the data for backpropagation doesn't even make sense. If so, feel free to correct me:)

Best Answer

If you train on transformed values of the outcome $(y)$, you will make predictions on the transformed scale. Think of the transformation as a unit conversion: if you teach the model to predict in terms of meters, it will do that. If you really want the result in terms of centimeters (miles, light years, whatever), then you have to invert the transformation to do the unit conversion. If you subtracted a number $a$ from each $y$ and then divided that difference by another number $b$ (which it seems like you did), you will recover the original scale by multiplying by $b$ and then adding $a$.

Consequently, if you want to know the loss in terms of the original scale (I think this is reasonable), you would apply that inverse transformation to the predictions and then calculate the loss based on those inverse-transformed predictions and the corresponding observed values that have not been transformed.

If you want to know the loss in terms of the transformed scale (I don't see why, but maybe you have a reason), then you would take the raw predictions made by the model (which are on the transformed scale) and compare those to their corresponding transformed observations.