[Math] Understanding partial derivative of logistic regression cost function

calculusderivativeslogistic regressionpartial derivative

I'm following along in Andrew Ng's great lecture series on machine learning, and he presents the following as the cost function for a logistic regression model [link]:

$$L(a,y) = -(y \log(a) + (1 – y) \log(1 – a)) $$

He then builds a little math graph, or series of equations, that can be used as helpers for computing the partial derivatives of $L$ with respect to various variables [link]:

$$ z = w_1x_1 + w_2x_2 + b $$
$$ \hat{y} = a = \sigma(z) $$

Next he says that the following represents the derivative of $L$ wrt $a$ [link]:

$$ \frac{\partial L}{\partial a} = -\frac{y}{a} + \frac{1-y}{1-a} $$

Unfortunately, he doesn't give any clues as to how this can be derived. Does anyone here know how to derive this partial derivative given the equations above? I'd be very grateful for any insights others can offer on this question!

Best Answer

if your equation is

$$L(a,y)=-\left(y\log(a)+(1-y)\log(1-a)\right)$$

we get

$$\frac{\partial L(a,y)}{\partial a}=-\left(\frac{y}{a}+\frac{1-y}{1-a}\cdot (-1)\right)$$

which simplifies to

$$-\frac{y}{a} + \frac{1-y}{1-a}$$

since

$$(\log(a))'=\frac{1}{a}$$

and

$$(\log(1-a))'=\frac{1}{1-a}\cdot (-1)$$

using the chain rule