Solved – Pytorch Cross Entropy Loss implementation counterintuitive

cross entropylog-lossneural networkspythontorch

there is something I don't understand in the PyTorch implementation of Cross Entropy Loss.

As far as I understand, theoretical Cross Entropy Loss is taking log-softmax probabilities and output a real that should be closer to zero as the output is close to the target (https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html#cross-entropy for reference)

Yet the following puzzles me:

>>> output=torch.tensor([[0.0,1.0,0.0]]) #Activation is only on the correct class
>>> target=torch.tensor([1])
>>> loss=torch.nn.CrossEntropyLoss()
>>> loss(output,target)
tensor(0.5514)

From my understanding, loss(output,target) should yield 0.0, since this is the textbook example of a 100% confident neural network.
The formula given in https://pytorch.org/docs/stable/nn.html#crossentropyloss does not convince me on how it is strictly equivalent to the theoretical definition of cross entropy loss.

Is this a problem that my loss function is not equal to 0 when my model's outputs are showing 100% confidence?

Best Answer

The documentation says that this loss function is computed using the logloss of the softmax of $x$ (output in your code). For your example, we have $$ \begin{align} -\log\left(\frac{\exp(x_j)}{\sum_i \exp (x_i)}\right)&= -x_j+\log\left(\sum_i \exp(x_i)\right) \\ &= -1 + \log\left( \exp(0) + \exp(1) + \exp(0) \right) \\ &= 0.5514. \end{align} $$

To achieve the desired result, you could either have your network output scores as described in the documentation, or else use a loss function that works directly with probabilities.