Let $A\in\mathbb{R}^{m\times n}$, $b\in \mathbb{R}^m$. For
$x\in\mathbb{R}^n$, we define $q(x) = f(Ax+b)$ with
$f:\mathbb{R}^m\to\mathbb{R}$. Find the gradient and hessian of the
function $q$.
This question is kinda strange. If I were to take the jacobian, I would just compose the jacobian of the outer funciton with the jacobian of the inner function. Now, how do I take the partial derivative of $q$?
$$\frac{\partial f(Ax+b)}{\partial x_1} = \lim_{h\to 0}\frac{f(A(x_1+h,x_2,\cdots,x_n) + b) – f(Ax+b)}{h}$$
I don't think it helps in thinking this way. I have no means of finding this limit without using some chain rule or so. Maybe I can apply the chain rule to $q$, but how?
UPDATE:
By the hint given below,
$$q(x_1,\dots, x_n)=f(f_1,\cdots,f_n) = f\left(\sum_{i=1}^n a_{1i}x_i+b_1,\dots,\sum_{i=1}^n a_{mi}x_i+b_m\right)$$
I think the multivariable chain rule can be applied:
$$\frac{\partial f}{\partial x_1} = \frac{\partial f}{\partial f_1}\frac{\partial f_1}{\partial x_1} + \cdots + \frac{\partial f}{\partial f_n}\frac{\partial f_n}{\partial x_1}$$
And see that
$$\frac{\partial f_1}{\partial x_1} = a_{11}\\\cdots\\\frac{\partial f_n}{\partial x_1} = a_{m1}$$
So we get
$$\frac{\partial f}{\partial x_1} = \frac{\partial f}{\partial f_1}a_{11} + \cdots + \frac{\partial f}{\partial f_n}a_{m1}$$
In general:
$$\frac{\partial f}{\partial x_j} = \frac{\partial f}{\partial f_1}a_{1j} + \cdots + \frac{\partial f}{\partial f_n}a_{mj}$$
I think there's still a lot of work to do.
Best Answer
Background knowledge: if $F:\mathbb R^n \to \mathbb R^m$ is differentiable at $x$, then $F'(x)$ is an $m \times n$ matrix.
Let $g(x) = f(Ax + b)$. By the chain rule, $$ g'(x) = f'(Ax + b)A. $$ If we use the convention that the gradient is a column vector, then $$ \nabla g(x) = g'(x)^T = A^T \nabla f(Ax + b). $$ The Hessian of $g$ is the derivative of the function $x \mapsto \nabla g(x)$. By the chain rule, $$ \nabla^2 g(x) = A^T \nabla^2 f(Ax + b) A. $$