In Andrew Ng's Neural Networks and Deep Learning course on Coursera the logistic regression loss function for a single training example is given as:
$$ \mathcal L(a,y) = – \Big(y\log a + (1 – y)\log (1 -a)\Big) $$
Where $a$ is the activation of the neuron.
The following slide gives the partial derivatives, including:
$$\frac{\delta \mathcal L(a, y)}{\delta a} = – \frac y a + \frac{1-y}{1-a}$$
Why isn't the 2nd term negative, ie: $ – \frac{1-y}{1-a}$, given the negative outside the brackets in the definition of $\mathcal L(a, y)$?
Best Answer
Using the Derivative Calculator I found out that I wasn't applying the chain rule.
$$\begin{align}\tfrac d {da} log(1-a) &= \dfrac{1}{1-a}\cdot \tfrac{\mathrm{d}}{\mathrm{d}a}\left[1-a\right] \\ & = \dfrac{1}{1-a}\cdot -1 \end{align}$$