Given $f\colon \mathbb{R}^n\rightarrow \mathbb{R}$ smooth and $\phi \in GL(n)$. What is the Hessian matrix $H_{f\circ \phi} = \left(\frac{\partial ^2 (f\circ \phi)}{\partial x_i\partial x_j}\right)_{ij}$?
Real Analysis – Chain Rule for Hessian Matrix
analysismultivariable-calculusreal-analysisvector analysis
Related Solutions
The determinant is the product of the eigenvalues. In two dimensions, the product being positive means that the two eigenvalues have the same sign, so $f$ is either concave or convex, and if the first derivative vanishes the point is an extremum. In higher dimensions, there's no such conclusion, since you could have any number of both positive and negative eigenvalues and have the determinant come out positive or negative.
$\newcommand{\hom}{\operatorname{Hom}}$Let us write $A \subseteq U=V=\mathbb{R}^n$. We have \begin{align*} f:& A \to \mathbb{R}\\ Df:& A \to \hom(V,\mathbb{R})\\ D^2f:=D(Df):& A \to \hom(U, \hom(V,\mathbb{R})) \end{align*} Note that the codomain of $Df$ is $\hom(V,\mathbb{R})$ (the set of all linear maps from $V$ to $\mathbb{R}$), which is not a Euclidean space. So, what do we mean by $D(Df)$? Since $\hom(V,\mathbb{R})$ is isomorphic to $\mathbb{R}^n$, the answer is that we first identify each linear operator $L_i: e_j\mapsto\delta_{ij}$ with the vector $e_i$. Hence the linear map $Df(x)$ is identified with the gradient vector $\nabla f(x)=\left(\frac{\partial f}{\partial x_1},\,\frac{\partial f}{\partial x_2},\,\ldots,\,\frac{\partial f}{\partial x_n}\right)^\top$ before taking a second derivative.
Having this identification, the matrix of $D^2f$, as a linear operator, is the Jacobian matrix of $\nabla f$, i.e. $J(\nabla f)(x)=\left(\frac{\partial^2 f}{\partial x_j \partial x_i}\right)$. Therefore, $D^2f(x)(u)$ is represented by the vector $\left(\frac{\partial^2 f}{\partial x_j \partial x_i}\right)u$ and $$ \left(D^2f(x)(u)\right)(v) = \left[\left(\frac{\partial^2 f}{\partial x_j \partial x_i}\right)u\right]^\top v = u^\top \left(\frac{\partial^2 f}{\partial x_i \partial x_j}\right)v.\tag{1} $$ Technically, $D^2f(x)$ is not a bilinear form, but a linear operator that maps vectors to linear operators. Yet, the mapping $B_x:U\times V\to\mathbb{R}$ by $B_x(u,v)=\left(D^2f(x)(u)\right)(v)$ is bilinear. Therefore we can identify $D^2f(x)$ with the bilinear form $B_x$. And when we speak of the matrix of $D^2f(x):U\times V\to\mathbb{R}$, we actually mean the matrix of $B_x$ (and $D^2f(x)$ is not really a function from $U\times V$ to $\mathbb{R}$ in the first place).
By convention, if $\mathbf{u},\mathbf{v}$ and $\mathbf{B}$ represent respectively two vectors $u,v$ and a bilinear form $b(u,v)$ with respect to some basis, then $b(u,v)=\mathbf{u}^\top \mathbf{B}\mathbf{v}$. Therefore, from $(1)$, we see that the matrix of $B_x$ with respect to the standard basis is $\left(\frac{\partial^2 f}{\partial x_i \partial x_j}\right)$.
Best Answer
Denote $H_g(x)$ the Hessian matrix of a function $g$. Denote $g=f\circ \phi$. By the chain rule, we have $$D(f\circ\phi)(x)\cdot h=D(f(\phi x))\cdot D\phi\cdot h=D(f(\phi x))\cdot \phi\cdot h$$ hence $D(g)(x)=D(f(\phi x))\cdot \phi$. In particular, $$\partial_j g(x)=\sum_{k=1}^n\partial_kf(\phi x)a_{kj},$$ where $a_{kj}$ is the $(k,j)$-th entry of $\phi$.We can do the same, for a fixed $k$, for the map $x\mapsto \partial_kf(\phi x)$. We get \begin{align} \partial_{ij}f(\phi x)&=\sum_{k,l=1}^n(H_f(\phi x))_{lk}a_{li}a_{kj}\\ &=\sum_{k=1}^n(\phi^tH_f(\phi x))_{ik}a_{kj}\\ &=(\phi^tH_f(\phi x)\phi)_{ij}. \end{align}