[Math] the Hessian matrix of $x\mapsto f(Ax+b)$

analysiscalculusderivativesmultivariable-calculus

Let

  • $A\in\mathbb{R}^{n\times n}$ and $b\in\mathbb{R}^n$
  • $f\in C^2(\mathbb{R}^n)$ and $\tilde{f}(x):=f(Ax+b)$ for $x\in\mathbb{R}^n$

It's easy to prove that $$\nabla\tilde{f}(x)=A^T\nabla f(x)$$ But I'm not able to prove that the Hessian matrix $$\nabla^2\tilde{f}(x)=A^T\nabla^2 f(x)A$$


Shouldn't we have $$\nabla^2\tilde{f}(x)=\left(\begin{matrix}\frac{\partial}{\partial x_1}\\\vdots\\\frac{\partial}{\partial x_n} \end{matrix}\right)A^T\nabla f(x)=A\left(\begin{matrix}\frac{\partial}{\partial x_1}\\\vdots\\\frac{\partial}{\partial x_n} \end{matrix}\right)\nabla f(x)=A\nabla^2f(x)\;?$$

Best Answer

Let $f: \mathbb{R}^{2}\rightarrow \mathbb{R}$. Then locally we have $$ f(x+x_0, y+y_0)\approx f(x,y)+Df(x_0,y_0)*(x,y)+\frac{1}{2}(x,y)^{T}D^2f(x_0,y_0)*(x,y) $$ where we used the definition of Frechet derivative and $f\in C^2$ to give the Taylor expansion.

Now if we have $g=f\circ A$, then explicitly write out in coordinates we have $$ Dg(x_0,y_0)=Df(A(x_0,y_0))*(A(x,y))=(Df\circ A)(x,y) $$ because $A$ is linear. A similar expansion now gives you $$ D^2g(x_0,y_0)=A^{T}*D^{2}f(A(x_0,y_0))*A $$ because $(Av)^{T}=v^T*A^T$ in general. The same reasoning clearly carries over to all $\mathbb{R}^{n}$. I do not know how did you derive $Dg=A^{T}*Df$, I suspect that it is incorrect.