Loss Function Regression – Proper Loss Function for Uniform Target Distribution Regression

distributionsloss-functionsneural networks

I'm doing some simulations and I would like to estimate a real number that is uniformly distributed between minValue and maxValue. For instance, between 20 and 30 (it's not an angle, so estimating its sine isn't appropiate). So far, I have used the MSE loss, but after plotting the histogram of the estimated samples, they follow a Gaussian distribution.

After some research on the Internet, I saw that using the L2 norm assumes that the target is normally distributed (unrelated question: what is the mathematical reason for that?). However, the target follows an uniform distribution.

Therefore, what could be a good loss function to improve the distribution of the estimations? Could it be solved by using a bigger network? My network is composed by 9 convLayers imitating the ResNet architecture and a fully-connected layer to estimate the target.

Finally, since the target data is being simulated, I have access to infinite data.

Best Answer

using the L2 norm assumes that the target is normally distributed

Sorry, but this is nonsense. (There is a lot of nonsense on the internet.)

Your choice of error measure or loss function assumes nothing about the (conditional or unconditional) distribution of the target variable. Rather, different loss functions elicit different functionals of the target variable. The MSE will be minimized in expectation by the conditional mean, whether the conditional distribution is normal or Poisson. (Assuming this expectation exists, and we are not dealing with a Cauchy.) The MAE will be minimized in expectation by the median. If your distribution is indeed symmetric, like the normal, both MSE and MAE will tend towards the same point prediction, but if the distribution is asymmetric, like the Poisson, the two minimizers will be different.

You may find a paper of mine (Kolassa, 2020, IJF) useful. Or this thread: What are the shortcomings of the Mean Absolute Percentage Error (MAPE)?

Thus, your strategy should be to first decide which functional of your target distribution you are looking for - the mean, the median, a quantile, whatever. Then, and only then, can you choose an error measure that elicits this functional.

Related Question