Logistic Regression – Deriving the Partial of Cost Function in Logistic Regression

calculuslogistic regressionmachine learning

I'm not sure how to derive the formula for updating the parameters vector $\theta$ in logistic regression with cost function $$J(\theta)=\frac{-1}{m}[\sum_{i=1}^{m}y^{(i)}\log h_{\theta}(x^{(i)})+(1-y^{(i)})\log(1-h_{\theta}(x^{(i)}))]$$

Here, $h_{\theta}=\frac{1}{1+e^{-\theta^{T}*x}}$ and $x^{(i)}, y^{(i)}$ is the $i^{th}$ observation in the sample of $m$ observations.

In the lecture, they say that we update each $\theta_j$ by substracting from it the learning rate times the partial of $J(\theta)$ with respect to $\theta_j$. Which gives $$\theta_j:=\theta_j-\alpha\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}$$

Also, in one of the assignment, they gave option for
$$\theta_j:=\theta_j-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}$$

But I wasn't able to get the partial of $J(\theta)$.

Here is what I get:
$$ \frac{\partial J}{\partial \theta_j}=\frac{-1}{m}\sum_{i=1}^mx_j^{(i)}(-h_{\theta}(x^{(i)})y^{(i)}-h_{\theta}(x^{(i)})^2(1-y^{(i)})e^{\theta^T*x^{(i)}})$$

Best Answer

Just to remove a little of indices, let me consider the case with $m=1$ and $x,y$ are scalars and not vectors. In this case $J$ becomes:

$$J(\theta,x,y) = -y \log(h_\theta(x)) - (1-y) \log(1-h_\theta(x))$$

$$\frac{\partial h_\theta}{\partial \theta}(\theta,x) = \frac{\partial}{\partial \theta}((1+e^{-\theta x})^{-1}) = x e^{-\theta x} h_\theta^2(x)$$

$$\frac{\partial J}{\partial \theta}(\theta,x,y) = -y\frac{x e^{-\theta x} h_\theta^2(x)}{h_\theta(x)} + (1-y)\frac{x e^{-\theta x} h_\theta^2(x)}{1-h_\theta(x)} = x e^{-\theta x} h_\theta^2(x)(\frac{1-y}{1-h_{\theta}(x)}-\frac{y}{h_\theta(x)})=$$

$$= \frac{x e^{-\theta x} h_\theta(x)}{1-h_\theta (x)}((1-y)h_\theta (x) - y(1-h_\theta (x))) = \frac{x e^{-\theta x} h_\theta(x)}{1-h_\theta (x)} (h_\theta(x) - y)$$

Noticing that $e^{-\theta x} h_\theta(x) = 1-h_\theta(x)$, $\frac{\partial J}{\partial \theta}(\theta,x,y) = x(h_\theta(x) -y)$.