Solved – How to calculate the derivative of crossentropy error function

derivativeloss-functionsmachine learningneural networksoptimization

I'm reading this tutorial (presented below) on computing derivative of crossentropy. The author used the loss function of logistic regression I think.
https://www.dropbox.com/s/rxrtz3auu845fuy/Softmax.pdf?dl=0

Most of the equations make sense to me except one thing. In the second page, there is:
$$\frac{\partial E_x}{\partial o^x_j}=\frac{t_j^x}{o_j^x}+\frac{1-t_j^x}{1-o^x_j}$$
However in the third page, the "Crossentropy derivative" becomes

$$\frac{\partial E_x}{\partial o^x_j}=-\frac{t_j^x}{o_j^x}+\frac{1-t_j^x}{1-o^x_j}$$

There is a minus sign in $E_x$. Then the derivative should be $\frac{\partial E_x}{\partial o^x_j}=-\frac{t_j^x}{o_j^x}-\frac{1-t_j^x}{1-o^x_j}$. But it is not. What have I missed?


The tutorial:

enter image description here
enter image description here

Best Answer

There is indeed a mistake:\begin{align} \frac{\partial E_x}{\partial o_j^x} &=\frac{\partial }{\partial o_j^x} \left( - \sum_{k}[t_k^x \log(o_k^x)] + (1-t_k^x) \log(1-o_k^x)]\right) \\ &=-\frac{\partial }{\partial o_j^x} \left( \sum_{k}[t_k^x \log(o_k^x)] + (1-t_k^x) \log(1-o_k^x)]\right) \\ &=-\frac{\partial }{\partial o_j^x} \left( [t_j^x \log(o_j^x)] + (1-t_j^x) \log(1-o_j^x)]\right) \\ &=- \left( \frac{t_j^x}{o_j^x} - \frac{1-t_j^x}{1-o_j^x}\right), \text{Chain rule} \\ &=- \frac{t_j^x}{o_j^x} + \frac{1-t_j^x}{1-o_j^x} \\ \end{align}

Related Question