Neural Networks – Creating a Custom Loss Function: Tips and Advice

loss-functionsneural networkstensorflow

I'm trying to solve a regression problem using a neural network. In my problem domain, an underestimation is a lot worse than an overestimation, so I thought I'd create a custom loss function for my network.
Currently, I'm thinking about something along the lines of:
$$Loss(pred, label) = \begin{cases}
x & \text{if } pred – label \geq 0\\
x^2 & \text{if } pred – label < 0
\end{cases}$$

There's one problem I can already see upfront, and that's that the function is not differenciable for $pred-label = 0$

My question here is two-fold:

  1. What can I do to solve the differenciability problem?
  2. What other factors are important when choosing/designing loss functions?

(My network will be implemented in TensorFlow, in case this is relevant)

Best Answer

  1. Tensorflow works fine even if your loss function is nondifferentiable at a finite set of points. In fact, nonlinear activation function ReLU(), which is widely used in various deep learning models, is not differentiable at x=0, too. Auto differentiation implemented in Tensorflow and other software does not require your function to be differentiable everywhere.

  2. Choosing a proper loss function is highly problem dependent. There is no one-size-fit-all solution. A lot of experiments are needed to choose models, loss functions, learning algorithms, and hyper parameters, etc. However, the absolute value of an arbitrary loss function does not contain "sufficient" information to tell whether your model is good or not, and cannot be used to compare with other models directly. For example, if your loss function is $L(\theta)$, then $L^{'}(\theta)=cL(\theta), c>0$ can also be a loss function.

What you need is find an unbiased metric to test if your model is good and use it for comparison with other models. For example, for classification task, accuracy is used to measure the model performance. Note though our goal is to increase the accuracy, we CrossEntropyLoss instead of $1-accuracy$ as loss because accuracy is not effective for backpropagation.

To summarize, make clear what your final goal is, choose an unbiased metric that can be used for comparing with other possible models, and define your objective and loss function accordingly. Sometimes (but not often) your evaluation metric can be your loss function, if it is easily differentiable and can be used for comparing with other models directly.

Related Question