[Math] Dimensions of matrix derivative and chain rule

derivativeslinear algebramatrices

I'm trying to calculate the derivative of the next expression with respect to matrix square $X$:
$$
\frac{∂ a^T(A – X)^{-1}b}{∂ X}
$$

where $A$ is constant square matrix, $a$ and $b$ are vectors. Since the top-level expression is scalar the derivative should have matrix form. However, when I apply the chain rule I get:
$$
ab^T \frac{∂(A-X)^{-1}}{∂X} = ab^T (A-X)^{-2} \frac{∂(A-X)}{∂X}
$$

There should be definitely a mistake because $\frac{∂(A-X)}{∂X}$ is matrix-by-matrix derivative and hence is more than 2-dimensional object. I suspect that I misunderstand how chain rule works for matrices, but can not find a mistake.

Best Answer

Rather than the chain rule, use differentials for this sort of problem.

For convenience, define a new variable $$\eqalign{ M &= A-X \cr dM &= -dX \cr }$$ Express the function in terms of the Frobenius (:) Inner Product and find its differential $$\eqalign{ f &= M^{-1}:ab^T \cr\cr df &= dM^{-1}:ab^T \cr &=-M^{-1}\,dM\,M^{-1}:ab^T \cr &= M^{-1}\,dX\,M^{-1}:ab^T \cr &= M^{-T}ab^TM^{-T}:dX \cr }$$ Since $df=\big(\frac{\partial f}{\partial X}:dX\big),\,$ the gradient must be $$\eqalign{ \frac{\partial f}{\partial X} &= M^{-T}ab^TM^{-T} \cr &= (A^T-X^T)^{-1}ab^T(A^T-X^T)^{-1} \cr }$$