Solved – Asymmetric cost function in neural networks

I am trying to build a deep neural network based on asymmetric loss functions that penalizes underestimation of a time series. Preferably, by the use of the LINEX loss function (Varian 1975):
$ \quad \quad
L_{a,b}(y,\hat{y}) = b(e^{-a(y-\hat{y})} + a(y-\hat{y}) – 1), \quad \quad \quad \text{with } a \neq 0 \text{ and } b>0
$

but I can't find any research papers where this is done, and only very few on other asymmetric loss functions as well.

The function is differentiable and gives reasonable results for values of a $\approx0$ using neuralnet(), for which the loss function approximates a square error function, but very poor results for increasing values of a.

This might explain why there are not many papers on asymmetric loss functions in neural networks, but why does it perform so bad when the asymmetry becomes larger?

EDIT

With asymmetric loss functions, I mean loss functions that are biased and with different slopes for negative and positive error. Examples are given below.

Concerning my network:
I used the neuralnet() package testing several options with 1 hidden layer for both sigmoid and tanh activation functions. At the end I used an identity function. At the LINEX loss function stated above, y is the desired output and $\hat{y}$ the activation output from the network. I have min-max normalized all 8 inputs as well as outputs y.

With the statement

if a$\approx$0, the loss function approximates a square error function
I mean that the form of the LINEX loss function looks similar to a squared error function (symmetric), see picture below for example of LINEX loss wih b = 1 and a = 0.001

To restate my question: is there more research known that works with asymmetric loss functions in neural networks (preferably the LINEX)? If not, why? Since it is widely used for other model types.

Best Answer

This might explain why there are not many papers on asymmetric loss functions.

That's not true. Cross-entropy is used as loss function in most classification problems (and problems that aren't standard classification, like for example autoencoder training and segmentation problems), and it's not symmetric.

Best Answer

Related Solutions

Solved – Why activation functions that approximate the identity near origin are preferable

Neural-Networks – Tackling Poor Performance with ReLU Activation Function on MNIST Dataset

Related Question