Neural Networks – Different Definitions of Cross Entropy Loss Function Explained

cross entropyloss-functionsneural networkssoftmax

I started off learning about neural networks with the neuralnetworksanddeeplearning dot com tutorial. In particular in the 3rd chapter there is a section about the cross entropy function, and defines the cross entropy loss as:

$C = -\frac{1}{n} \sum\limits_x \sum\limits_j (y_j \ln a^L_j + (1-y_j) \ln (1 – a^L_j))$

However, reading the Tensorflow introduction, the cross entropy loss is defined as:

$C = -\frac{1}{n} \sum\limits_x \sum\limits_j (y_j \ln a^L_j)$ (when using the same symbols as above)

Then searching around to find what was going on I found another set of notes: (https://cs231n.github.io/linear-classify/#softmax-classifier) that uses a completely different definition of the cross entropy loss, albeit this time for an softmax classifier rather than for a neural network.

Can someone explain to me what is going on here? Why are there discrepancies btw. what people define the cross-entropy loss as? Is there just some overarching principle?

Best Answer

These three definitions are essentially the same.

1) The Tensorflow introduction, $$C = -\frac{1}{n} \sum\limits_x\sum\limits_{j} (y_j \ln a_j).$$

2) For binary classifications $j=2$, it becomes $$C = -\frac{1}{n} \sum\limits_x (y_1 \ln a_1 + y_2 \ln a_2)$$ and because of the constraints $\sum_ja_j=1$ and $\sum_jy_j=1$, it can be rewritten as $$C = -\frac{1}{n} \sum\limits_x (y_1 \ln a_1 + (1-y_1) \ln (1-a_1))$$ which is the same as in the 3rd chapter.

3) Moreover, if $y$ is a one-hot vector (which is commonly the case for classification labels) with $y_k$ being the only non-zero element, then the cross entropy loss of the corresponding sample is $$C_x=-\sum\limits_{j} (y_j \ln a_j)=-(0+0+...+y_k\ln a_k)=-\ln a_k.$$

In the cs231 notes, the cross entropy loss of one sample is given together with softmax normalization as $$C_x=-\ln(a_k)=-\ln\left(\frac{e^{f_k}}{\sum_je^{f_j}}\right).$$