Solved – Computing the Hessian of maximum log likelihood function

extreme valuehessianlikelihoodlogarithm

I am trying to find the Hessian matrix for the maximum log likelihood function given training data ${(xi, yi)}$ for $i=1:N$ with $yi ∈ \left\{+1, −1\right\}$ for each $i = 1,\dots, N$ for the function:
enter image description here

When I try to find the Hessian matrix for this function, I get this:
enter image description here

which gives a vector, not a number which doesn't seem be correct. How do you calculate the off diagonals of the Hessian matrix without the result being a vector?

This is what I get for the diagonals of the Hessian and the gradients:

enter image description here

Best Answer

...which gives a vector, not a number...

If you differentiate a multivariable function $f$ with respect to a vector $\mathbf{w}$ then you get a derivative that is itself a vector (the gradient vector). If you would like to get scalar second derivatives then you need to differentiate with respect to the elements of $\mathbf{w}$ instead of the whole vector. In this particular case, if you have a vector $\mathbf{w} = (w_1,...,w_m)$ then the Hessian matrix will consist of the following scalar second-order partial derivatives:

$$\mathbf{H}(b,\mathbf{w}) = \begin{bmatrix} \frac{\partial^2 f}{\partial b^2}(b, \mathbf{w}) & \frac{\partial^2 f}{\partial b \partial w_1}(b, \mathbf{w}) & \cdots & \frac{\partial^2 f}{\partial b \partial w_m}(b, \mathbf{w}) \\ \frac{\partial^2 f}{\partial b \partial w_1}(b, \mathbf{w}) & \frac{\partial^2 f}{\partial w_1^2}(b, \mathbf{w}) & \cdots & \frac{\partial^2 f}{\partial w_1 \partial w_m}(b, \mathbf{w}) \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial b \partial w_m}(b, \mathbf{w}) & \frac{\partial^2 f}{\partial w_1 \partial w_m}(b, \mathbf{w}) & \cdots & \frac{\partial^2 f}{\partial w_m^2}(b, \mathbf{w}) \\ \end{bmatrix}.$$