Jacobian, Hessian, and Gradient – Understanding the Connection

multivariable-calculus

In this Wikipedia article they have this to say about the gradient:

If $m = 1$, $\mathbf{f}$ is a scalar field and the Jacobian matrix is reduced to a row vector of partial derivatives of $\mathbf{f}$—i.e. the gradient of $\mathbf{f}$.

As well as

The Jacobian of the gradient of a scalar function of several variables has a special name: the Hessian matrix, which in a sense is the "second derivative" of the function in question.

So I tried doing the calculations, and was stumped.

If we let $f: \mathbb{R}^n \to \mathbb{R}$, then
$$Df = \begin{bmatrix}
\frac{\partial f}{\partial x_1} & \dots & \frac{\partial f}{\partial x_n}
\end{bmatrix} = \nabla f$$
So far so good, but when I try to calculate the Jacobian matrix of the gradient I get
$$D^2f = \begin{bmatrix}
\frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_2 x_1} & \dots & \frac{\partial^2 f}{\partial x_n x_1} \\
\frac{\partial^2 f}{\partial x_1 x_2} & \frac{\partial^2 f}{\partial x_2^2} & \dots & \frac{\partial^2 f}{\partial x_n x_2} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial^2 f}{\partial x_1 x_n} & \frac{\partial^2 f}{\partial x_2 x_n} & \dots & \frac{\partial^2 f}{\partial x_n^2}
\end{bmatrix}$$
Which according to this article, is not equal to the Hessian matrix but rather its transpose, and from what I can gather the Hessian is not generally symmetric.

So I have two questions, is the gradient generally thought of as a row vector? And did I do something wrong when I calculated the Jacobian of the gradient of $f$, or is the Wikipedia article incorrect?

Best Answer

You did not do anything wrong in your calculation. If you directly compute the Jacobian of the gradient of $f$ with the conventions you used, you will end up with the transpose of the Hessian. This is noted more clearly in the introduction to the Hessian on Wikipedia (https://en.wikipedia.org/wiki/Hessian_matrix) where it says

The Hessian matrix can be considered related to the Jacobian matrix by $\mathbf{H}(f(\mathbf{x})) = \mathbf{J}(∇f(\mathbf{x}))^T$.

The other Wikipedia article should probably update the language to match accordingly.

As for the gradient of $f$ is being defined as a row vector, that is the way I have seen it more often, but it is noted https://en.wikipedia.org/wiki/Matrix_calculus#Layout_conventions that there are competing conventions for general matrix derivatives. However, I don't think that should change your answer for the Hessian- with the conventions you are using, you are correct that it should be transposed.