[Math] Find the gradient and hessian of $f(Ax+b)$ for real value $f$ and matrix $A$

calculusderivativesmultivariable-calculusreal-analysis

Let $A\in\mathbb{R}^{m\times n}$, $b\in \mathbb{R}^m$. For
$x\in\mathbb{R}^n$, we define $q(x) = f(Ax+b)$ with
$f:\mathbb{R}^m\to\mathbb{R}$. Find the gradient and hessian of the
function $q$.

This question is kinda strange. If I were to take the jacobian, I would just compose the jacobian of the outer funciton with the jacobian of the inner function. Now, how do I take the partial derivative of $q$?

$$\frac{\partial f(Ax+b)}{\partial x_1} = \lim_{h\to 0}\frac{f(A(x_1+h,x_2,\cdots,x_n) + b) – f(Ax+b)}{h}$$

I don't think it helps in thinking this way. I have no means of finding this limit without using some chain rule or so. Maybe I can apply the chain rule to $q$, but how?

UPDATE:

By the hint given below,

$$q(x_1,\dots, x_n)=f(f_1,\cdots,f_n) = f\left(\sum_{i=1}^n a_{1i}x_i+b_1,\dots,\sum_{i=1}^n a_{mi}x_i+b_m\right)$$

I think the multivariable chain rule can be applied:

$$\frac{\partial f}{\partial x_1} = \frac{\partial f}{\partial f_1}\frac{\partial f_1}{\partial x_1} + \cdots + \frac{\partial f}{\partial f_n}\frac{\partial f_n}{\partial x_1}$$

And see that

$$\frac{\partial f_1}{\partial x_1} = a_{11}\\\cdots\\\frac{\partial f_n}{\partial x_1} = a_{m1}$$

So we get

$$\frac{\partial f}{\partial x_1} = \frac{\partial f}{\partial f_1}a_{11} + \cdots + \frac{\partial f}{\partial f_n}a_{m1}$$

In general:

$$\frac{\partial f}{\partial x_j} = \frac{\partial f}{\partial f_1}a_{1j} + \cdots + \frac{\partial f}{\partial f_n}a_{mj}$$

I think there's still a lot of work to do.

Best Answer

Background knowledge: if $F:\mathbb R^n \to \mathbb R^m$ is differentiable at $x$, then $F'(x)$ is an $m \times n$ matrix.


Let $g(x) = f(Ax + b)$. By the chain rule, $$ g'(x) = f'(Ax + b)A. $$ If we use the convention that the gradient is a column vector, then $$ \nabla g(x) = g'(x)^T = A^T \nabla f(Ax + b). $$ The Hessian of $g$ is the derivative of the function $x \mapsto \nabla g(x)$. By the chain rule, $$ \nabla^2 g(x) = A^T \nabla^2 f(Ax + b) A. $$