Solved – Why can’t this function be used as a loss function

loss-functionsmachine learningneural networks

In a discussion, a friend mentioned that the function below cannot be optimized so it can't be used in a learning algorithm.

$$E_{in} = \frac{1}{N} \sum_{n=0}^N (h(x_n) \ne f(x_n))$$

Why can't this function be used as a loss function?

The Context

This is about machine learning and minimizing the error on a dataset $D$ of size $N$.
I'm talking about comparing the algorithm predictions against the actual outcome recorded from the real world. Comparing a prediction against its real value using a cost function.

$f$ is the "true" mapping and $h$ is my "model".

$h$ should approximate $f$.

Would someone, please, explain why it isn't differentiable as well?

Best Answer

The loss function in the original post seems like a 0-1 loss divided by $N$, which is in fact the ultimate goal of most of classification settings. Actually, 0-1 loss divided by $N$ is equivalent to the accuracy metric.

What your friend said is that it is difficult to use 0-1 loss directly for training a model. This is true for many reasons, but mostly because the loss is not differentiable. Many other loss functions used in classification, for example the likelihood of data, can be viewed as approximations of 0-1 loss.

However, since the 0-1 loss is so intuitive and straightforward, the loss function is often measured during the training and assessing models. For me, I would love to print out both likelihood and 0-1 loss.