I am reading about logistic regression (from https://piazza-resources.s3.amazonaws.com/h61o5linlbb1v0/h8exwp8dmm44ok/classificationV6.pdf?AWSAccessKeyId=AKIAIEDNRLJ4AZKBW6HA&Expires=1485650876&Signature=Rd4BqBgb4hPwWUjxAyxJNfPhklU%3D) and am looking at the negative log likelihood function. They take the gradient with respect to the weights and produce the result at the bottom of page 7. I calculated this myself and can't seem to get the solution that they arrived at.
They set
$$NLL = – \sum_{i=1}^N[(1-y)log(1-s(w^Tx_i))+y\;log\;(s(w^Tx_i))]$$
where $s$ is the sigmoid function $s(x) =\frac{1}{e^{-x}+1}$
When I take $\frac{\partial NNL}{\partial w}$, I get
$$ -\sum_{i=1}^N ( \;\frac{x_i(y_i-1)e^{w^Tx_i}}{e^{W^tx_i}+1} + \frac{x_iy_i}{e^{W^tx_i}+1})$$
and not $$ \sum_{i=1}^N (s(w^Tx_i)-y)x_i)$$
$$
I must be making a mistake since this is just a simple gradient calculation. Can anyone shed some light onto how this was computed?
Best Answer
It is a simple calculation but one can easily make a mistake. Since we have
$\frac{\partial s(x)}{\partial x} = s(x)(1 - s(x))\ \ \ \ \ \ \ \frac{\partial s(w^Tx_i)}{\partial w} = x_i(1 - s(w^Tx_i))s(w^Tx_i) \ \ \ \ \ \ \ \ \frac{\partial log(x)}{\partial x} = \frac{1}{x}$
so the derivative is
$\frac{\partial NLL}{\partial w} = \sum_{i = 1}^{n} (1 - y)\frac{x_i(1 - s(w^Tx_i))s(w^Tx_i)}{1 - s(w^Tx_i)}-y\frac{x_i(1 - s(w^Tx_i))s(w^Tx_i)}{s(w^Tx_i)}$
and i checked it indeed simplifies to
$\sum_{1=1}^{n} x_i(s(w^Tx_i) - y)$