[Math] Second derivative of the cost function of logistic function

derivativesmachine learningpartial derivativeregression

Do I have the correct solution for the second derivative of the cost function of a logistic function?

Cost Function
$$J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}y^{i}\log(h_\theta(x^{i}))+(1-y^{i})\log(1-h_\theta(x^{i}))$$

where $h_{\theta}(x)$ is defined as follows

$$h_{\theta}(x)=g(\theta^{T}x)$$
$$g(z)=\frac{1}{1+e^{-z}}$$

First Derivative
$$ \frac{\partial}{\partial\theta_{j}}J(\theta) =\sum_{i=1}^{m}(h_\theta(x^{i})-y^i)x_j^i$$

Second Derivative
$$
\begin{align*}
\frac{\partial}{\partial^2\theta_{j}}J'(\theta) &= \frac{\partial}{\partial\theta}\sum_{i=1}^{m}(h_\theta(x^{i})x_j^i -y^ix^i_j) \\
&= \frac{\partial}{\partial\theta}\sum_{i=1}^{m}(h_\theta(x^{i})x_j^i) \\
&= \frac{\partial}{\partial\theta}\sum_{i=1}^{m}\frac{x^{i}}{1+e^{-z}} \\
&= x^2 h_\theta(x) ^2
\end{align*}
$$

Best Answer

For convenience, define some variables and their differentials $$\eqalign{ z &= X^T\theta, &\,\,\,\, dz = X^Td\theta \cr p &= \exp(z), &\,\,\,\, dp = p\odot dz \cr h &= \frac{p}{1+p}, &\,\,\,\, dh = (h-h\odot h)\odot dz \,= (H-H^2)\,dz \cr }$$ where
$\,\,\,\,H={\rm Diag}(h)$
$\,\,\,\,\odot$ represents the Hadamard elementwise product
$\,\,\,\,\exp$ is applied elementwise
$\,\,\,\frac{p}{1+p}$ represents elementwise division


The cost function can be written in terms of these variables and the Frobenius inner product (represented by a colon) $$\eqalign{ J &= -\frac{1}{m}\Big[y:\log(h) + (1-y):\log(1-h)\Big] \cr }$$

The differential of the cost is $$\eqalign{ dJ &= -\frac{1}{m}\Big[y:d\log(h) + (1-y):d\log(1-h)\Big] \cr &= -\frac{1}{m}\Big[y:H^{-1}dh - (1-y):(I-H)^{-1}dh\Big] \cr &= -\frac{1}{m}\Big[H^{-1}y - (I-H)^{-1}(1-y)\Big]:dh \cr &= -\frac{1}{m}\Big[H^{-1}y - (I-H)^{-1}(1-y)\Big]:H(I-H)dz \cr &= -\frac{1}{m}\Big[(I-H)y - H(1-y)\Big]:dz \cr &= -\frac{1}{m}(y-h):X^Td\theta \cr &= \frac{1}{m}X(h-y):d\theta \cr }$$ The gradient $$\eqalign{ G =\frac{\partial J}{\partial\theta} &= \frac{1}{m}X(h-y) \cr }$$ (NB: Your gradient is missing the $\frac{1}{m}$ factor)

The differential of the gradient $$\eqalign{ dG &= \frac{1}{m}X\,dh \cr &= \frac{1}{m}X(H-H^2)\,dz \cr &= \frac{1}{m}X(H-H^2)X^T\,d\theta \cr\cr }$$ And finally, the gradient of the gradient (aka the Hessian) $$\eqalign{ \frac{\partial^2J}{\partial\theta\,\partial\theta^T} &= \frac{\partial G}{\partial\theta} = \frac{1}{m}X(H-H^2)X^T \cr }$$