[Math] Which is the correct Hessian matrix (the standard matrix of a bilinear form)

analysiseuclidean-geometrymatricesmultivariable-calculusreal-analysis

enter image description here

Please note the typo in the first entry: $\frac {\partial^2f} {\partial x_1\partial x_2}$ should instead be $\frac{\partial^2f} {\partial x_1\partial x_1}$.

Also, this Hessian matrix need not be symmetric as the partials need not all be continuous.

The convention in use is: $\frac {\partial^2f} {\partial x_1\partial x_2}$ means $\partial_{x_1} (\partial_{x_2}f)$.

Question: Shouldn't the correct matrix be the transpose of what is
given?

My reasoning is this:

Let $\{e_1, …, e_n\}$ be the standard basis for $\mathbb R^n$.

Because the matrix is for the bilinear form
$D^2f(x)$, by definition, the $ij$-entry of the matrix is given by
$$[D^2f(x)](e_i,e_j)=[[D(Df)(x)] (e_i)](e_j) =\frac {\partial^2f} {\partial x_j\partial x_i} \neq \frac {\partial^2f} {\partial x_i\partial x_j}.$$ In other words, in the $ij$-entry, the $i$th variable should be partialled before the $j$th variable, not the other way round.

Referring to the proof is not helpful as it is cryptic and makes no sense:
enter image description here
enter image description here
Information on how $D^2f(x)$ is defined:
enter image description here
enter image description here

Best Answer

$\newcommand{\hom}{\operatorname{Hom}}$Let us write $A \subseteq U=V=\mathbb{R}^n$. We have \begin{align*} f:& A \to \mathbb{R}\\ Df:& A \to \hom(V,\mathbb{R})\\ D^2f:=D(Df):& A \to \hom(U, \hom(V,\mathbb{R})) \end{align*} Note that the codomain of $Df$ is $\hom(V,\mathbb{R})$ (the set of all linear maps from $V$ to $\mathbb{R}$), which is not a Euclidean space. So, what do we mean by $D(Df)$? Since $\hom(V,\mathbb{R})$ is isomorphic to $\mathbb{R}^n$, the answer is that we first identify each linear operator $L_i: e_j\mapsto\delta_{ij}$ with the vector $e_i$. Hence the linear map $Df(x)$ is identified with the gradient vector $\nabla f(x)=\left(\frac{\partial f}{\partial x_1},\,\frac{\partial f}{\partial x_2},\,\ldots,\,\frac{\partial f}{\partial x_n}\right)^\top$ before taking a second derivative.

Having this identification, the matrix of $D^2f$, as a linear operator, is the Jacobian matrix of $\nabla f$, i.e. $J(\nabla f)(x)=\left(\frac{\partial^2 f}{\partial x_j \partial x_i}\right)$. Therefore, $D^2f(x)(u)$ is represented by the vector $\left(\frac{\partial^2 f}{\partial x_j \partial x_i}\right)u$ and $$ \left(D^2f(x)(u)\right)(v) = \left[\left(\frac{\partial^2 f}{\partial x_j \partial x_i}\right)u\right]^\top v = u^\top \left(\frac{\partial^2 f}{\partial x_i \partial x_j}\right)v.\tag{1} $$ Technically, $D^2f(x)$ is not a bilinear form, but a linear operator that maps vectors to linear operators. Yet, the mapping $B_x:U\times V\to\mathbb{R}$ by $B_x(u,v)=\left(D^2f(x)(u)\right)(v)$ is bilinear. Therefore we can identify $D^2f(x)$ with the bilinear form $B_x$. And when we speak of the matrix of $D^2f(x):U\times V\to\mathbb{R}$, we actually mean the matrix of $B_x$ (and $D^2f(x)$ is not really a function from $U\times V$ to $\mathbb{R}$ in the first place).

By convention, if $\mathbf{u},\mathbf{v}$ and $\mathbf{B}$ represent respectively two vectors $u,v$ and a bilinear form $b(u,v)$ with respect to some basis, then $b(u,v)=\mathbf{u}^\top \mathbf{B}\mathbf{v}$. Therefore, from $(1)$, we see that the matrix of $B_x$ with respect to the standard basis is $\left(\frac{\partial^2 f}{\partial x_i \partial x_j}\right)$.