Hessian of sigmoids

calculusderivativeshessian-matrixmultivariable-calculusscalar-fields

I have $q = f^{T} A f$, where $A$ is an $n \times n$ symmetric matrix and $f$ a $n \times 1$ vector output of a sigmoid function

$$f = \frac{1}{1 + e^{-w^T X}}$$

I want to take the seconder order derivative of (Hessian) $q$ w.r.t vector $w$ which is a $p$-dimensional vector. Hence $X$ is an $n \times p$ matrix. So far, I've only succeed in the computing the first order derivative using the chain rule, which I found to be as follows:

$$\frac{dq}{dw} = 2(Af)^{T}DX$$

where $D$ is a diagonal matrix with its diagonal entries equal to $f_{i}(1 – f_{i})$ Checking the result of the above gradient against automatic differentiation tools, it is correct. However, I'm stuck at this step to get the seconder order derivative of $q$. How can I move from the gradient and get the hessian of the quadratic form? Thanks in advance.

Best Answer

As you have found, the gradient writes $\mathbf{g} = 2 \mathbf{X}^T \mathbf{D} \mathbf{A} \sigma(\mathbf{Xw})$ where $\mathbf{D}=\mathrm{diag}[\sigma'(\mathbf{Xw})]$.

It follows $d\mathbf{g} = 2 \mathbf{X}^T [\mathbf{D}_1+ \mathbf{D} \mathbf{A} \mathbf{D}] \mathbf{X} d\mathbf{w} $ where $\mathbf{D}_1 =\mathrm{diag}[\mathbf{A} \sigma(\mathbf{Xw}) \circ\sigma''(\mathbf{Xw})] $

The (symmetric) Hessian is $\mathbf{H}=2 \mathbf{X}^T [\mathbf{D}_1+ \mathbf{D} \mathbf{A} \mathbf{D}] \mathbf{X} $

Maybe this expression can be simplified a bit.

Related Question