Matrix Differentiation Chain Rule

chain ruleimplicit-differentiationmatrices

I want to compute the derivative of a scalar with respect to a matrix where the scalar is linked with the matrix through a vector.
For instance let the scalar be $l = (x-c)^T (x-c)$ where $x,c$ are vectors $\in \mathbb{R}^p $
And $x = Az$ where A is a $pxq$ matrix.

Now if I want to calculate $\frac{d l}{d A}$ then I can write $\frac{d l}{d A} =\frac{d l}{d x} \frac{d x}{d A}$

$\frac{d l}{d x}$ is easy to compute but I need help for $\frac{d x}{d A}$

Best Answer

Define the vector $w=(Az-c)\;$ and write the function as $\,\ell=\|w\|^2$
Then calculate the gradient (wrt $w$) and change variables from $\,w\to A$ $$\eqalign{ \frac{\partial\ell}{\partial w} &= 2w \\ d\ell &= 2w:dw \;=\; 2w:dA\,z \;=\; 2wz^T:dA \\ \frac{\partial\ell}{\partial A} &= 2wz^T \\ }$$ where a colon denotes the Frobenius inner product, i.e. $$A:B = {\rm Tr}(A^TB) = {\rm Tr}(B^TA) = B:A$$


NB: By using the differential $(d\ell)$ in the intermediate step, there was no need to calculate the third-order tensor $\left(\frac{\partial x}{\partial A}\right)$ which is hard to calculate and awkward to manipulate algebraically.

Related Question