[Math] Gradient and Hessian of $g(x) = f(Ax + b)$

derivativeshessian-matrixmultivariable-calculusscalar-fields

Given scalar field $f : \Bbb R^m \to \Bbb R$, matrix $A \in \Bbb R^{m \times n}$ and vector $b \in \Bbb R^m$, find the gradient and the Hessian of the scalar field $g : \Bbb R^n \to \Bbb R$ defined by $g(x) := f(Ax + b)$.


I cannot find the expression for the derivative:
$g'(x) = f'(Ax + b)*(Ax + b)'$

I believe the derivative $f'(Ax + b)$ is simply A*partial derivatives. But I do dot know how to proceed with the other terms. I know the expressions for gradient and Hessian, but I did never see it in matrix form.

Best Answer

First, observe that if we may write $g(x+\Delta x)=g(x)+[h(x)]^T(\Delta x)+o(\Delta x)$, where $o(\Delta x)$ satisfies $\lim_{\Delta x\to 0}\frac{o(\Delta x)}{\|\Delta x\|}=0$, then $\nabla g(x)=h(x)$. Well using differentiability of $f$, \begin{align*} g(x+\Delta x) &= f(Ax+b+A\Delta x) \\ &= f(Ax+b) + [\nabla f(Ax+b)]^T(A\Delta x)+o(A\Delta x) \\ &= g(x)+[A^T\nabla f(Ax+b)]^T (\Delta x)+o(A\Delta x), \end{align*} where $o(A\Delta x)$ satisfies $\lim_{A\Delta x\to 0}\frac{o(A\Delta x)}{\|A\Delta x\|}=0.$ Then $\lim_{\Delta x\to 0}\frac{o(A\Delta x)}{\|\Delta x\|}=0$. Hence $\nabla g(x)=A^T\nabla f(Ax+b)$.

For the second derivative, use the fact that $f$ satisfies $$f(x+\Delta x)=f(x)+\nabla f(x)^T(\Delta x) + \frac{1}{2}(\Delta x)^T\nabla^2 f(Ax+b)(\Delta x) + o[(\|\Delta x\|)^2],$$ where $o[(\|\Delta x\|)^2]$ means $\lim_{\Delta x\to 0} \frac{o[(\|\Delta x\|)^2]}{\|\Delta x\|^2}=0$. Well, we have \begin{align*} g(x+\Delta x) &= f(Ax+b+A\Delta x) \\ &= f(Ax+b)+[\nabla f(Ax+b)]^T \cdot (A\Delta x) \\ &\quad\quad+ \frac{1}{2}(A\Delta x)^T\nabla^2 f(Ax+b)(A\Delta x)+o[(\|A\Delta x\|)^2] \\ &= g(x)+[A^T\nabla f(Ax+b)]^T(\Delta x)\\ &\quad\quad+\frac{1}{2}(\Delta x)^T\left[A^T\nabla^2 f(Ax+b)A\right](\Delta x) + o[(\|A\Delta x\|)^2] \\ &= g(x)+ [\nabla g(x)]^T(\Delta x)+ \frac{1}{2}(\Delta x)^T\left[A^T\nabla^2 f(Ax+b)A\right](\Delta x) + o[(\|A\Delta x\|)^2]. \end{align*} Now, assuming $\|A\|\ne 0$, $$\lim_{\Delta x\to 0}\frac{o[(A\Delta x)^2]}{\|\Delta x\|^2}=\lim_{\Delta x\to 0}\frac{o[(\|A\Delta x\|)^2]}{\|A\Delta x\|^2}=0.$$ By the uniqueness of Taylor expansions, we have $\nabla^2 g(x) = A^T\nabla^2 f(Ax+b)A$.