[Math] Hessian of Loss function ( Applying Newton’s method in Logistic Regression )

calculusderivativeslogistic regressionpartial derivative

If Cost function is L ,
$$ L=−(\frac{1}{m})(y(log(h(x))+(1−y)( log(1−h(x) ) ) $$
$$ h(x)=\frac{1}{1+e^{-(w^{T}x+b)}} $$
First order partial deriavative of L with respect to w is ,
$$\frac{\partial L}{\partial w} = – ( \frac{1}{m} ) ( h(w) – y )x $$
Question :
how do i find the second order partial derivative of L with respect to w ?, that is $$ \frac{\partial ^{2}L}{\partial w^{2}}$$
So that i can compute the error gradient by using Newton's method and update Weights $ w $, like this
$$ w_{new} = w_{old} – (\frac{\partial ^{2}L}{\partial w^{2}})^{-1} \ ( \frac{\partial L}{\partial w}) $$
Am just trying to figure out how Newton's method works with logistic regression.

Best Answer

It is not so clear that you get these concepts.

You should clarify inputs and outputs. What you seem to have done is calculated second derivative of a scalar valued function of one variable. In other words : $$\mathbb R^{1} \to \mathbb R^{1}$$ function. Jacobians take all different partial differentials with respect to all different input variables. For a function $$\cases{x \in \mathbb R^n\\f(x) \in \mathbb R^m}$$

you get an output that is a $n\times m$ matrix.

For a Hessian to be a matrix we would need for a function $f(x)$ to be $$\mathbb R^{n} \to \mathbb R^{1}$$

the more general case

$$\mathbb R^{n} \to \mathbb R^{m}$$

it will be a 3 indexed tensor.

Related Question