Solved – logistic regression gradient of weights

gradient descentlogisticmachine learningweights

I am reading about logistic regression (from https://piazza-resources.s3.amazonaws.com/h61o5linlbb1v0/h8exwp8dmm44ok/classificationV6.pdf?AWSAccessKeyId=AKIAIEDNRLJ4AZKBW6HA&Expires=1485650876&Signature=Rd4BqBgb4hPwWUjxAyxJNfPhklU%3D) and am looking at the negative log likelihood function. They take the gradient with respect to the weights and produce the result at the bottom of page 7. I calculated this myself and can't seem to get the solution that they arrived at.

They set

$$NLL = – \sum_{i=1}^N[(1-y)log(1-s(w^Tx_i))+y\;log\;(s(w^Tx_i))]$$

where $s$ is the sigmoid function $s(x) =\frac{1}{e^{-x}+1}$

When I take $\frac{\partial NNL}{\partial w}$, I get
$$ -\sum_{i=1}^N ( \;\frac{x_i(y_i-1)e^{w^Tx_i}}{e^{W^tx_i}+1} + \frac{x_iy_i}{e^{W^tx_i}+1})$$

and not $$ \sum_{i=1}^N (s(w^Tx_i)-y)x_i)$$

$$

I must be making a mistake since this is just a simple gradient calculation. Can anyone shed some light onto how this was computed?

Best Answer

It is a simple calculation but one can easily make a mistake. Since we have

$\frac{\partial s(x)}{\partial x} = s(x)(1 - s(x))\ \ \ \ \ \ \ \frac{\partial s(w^Tx_i)}{\partial w} = x_i(1 - s(w^Tx_i))s(w^Tx_i) \ \ \ \ \ \ \ \ \frac{\partial log(x)}{\partial x} = \frac{1}{x}$

so the derivative is

$\frac{\partial NLL}{\partial w} = \sum_{i = 1}^{n} (1 - y)\frac{x_i(1 - s(w^Tx_i))s(w^Tx_i)}{1 - s(w^Tx_i)}-y\frac{x_i(1 - s(w^Tx_i))s(w^Tx_i)}{s(w^Tx_i)}$

and i checked it indeed simplifies to

$\sum_{1=1}^{n} x_i(s(w^Tx_i) - y)$