[Math] Derivative of inverse quadratic function of a matrix

linear algebramatrices

I have been stuck with the following derivative for some time:
$$
\frac{\partial\,\mathbf{b}^\mathrm{T}(\mathbf{X}\mathbf{C}\mathbf{X}^\mathrm{T})^{-1}\mathbf{b}}{\partial\,\mathbf{X}}
$$, where $\mathbf{b}\in\mathbb{R}^{M\times1}$, $\mathbf{X}\in\mathbb{R}^{M\times N}$ and $\mathbf{C}\in\mathbb{R}^{N\times N}$ and $\mathbf{C}$ is symmetric.

I had a look in the Matrix Cookbook, but I am still not sure how to deal with the inverse of a matrix in the second order form. Is it correct to apply the chain rule?
$$\frac{\partial\,\mathbf{b}^\mathrm{T}(\mathbf{X}\mathbf{C}\mathbf{X}^\mathrm{T})^{-1}\mathbf{b}}{\partial\,\mathbf{X}} =
\frac{\partial\,\mathbf{b}^\mathrm{T}(\mathbf{X}\mathbf{C}\mathbf{X}^\mathrm{T})^{-1}\mathbf{b}}{\partial\,\mathbf{XCX}^\mathrm{T}}\cdot
\frac{\partial \, \mathbf{XCX}^{\mathrm{T}}}{\partial \, \mathbf{X}}.$$

In this case, the first partial derivative will be:
$$
\frac{\partial\,\mathbf{b}^\mathrm{T}(\mathbf{X}\mathbf{C}\mathbf{X}^\mathrm{T})^{-1}\mathbf{b}}{\partial\,\mathbf{XCX}^\mathrm{T}} =
-(\mathbf{X}\mathbf{C}\mathbf{X}^\mathrm{T})^\mathrm{-T}\mathbf{b}\mathbf{b}^\mathrm{T}(\mathbf{X}\mathbf{C}\mathbf{X}^\mathrm{T})^{-\mathrm{T}}
$$
(using Eq. 55, from 1). The second part, $\frac{\partial \, \mathbf{XCX}^{\mathrm{T}}}{\partial \, \mathbf{X}}$, will be similar to a fourth-rank tensor. How can I arrive at a result that is a $M\times N $ matrix?

I would really appreciate if someone could help me with this or provide some piece of advice.

Best Answer

Setting $D = X C X^T$ we use (53) from Matrix Cookbook:

$$\frac{\partial\,D^{-1}}{\partial \, x_{ij}} = - D^{-1} \frac{\partial\,D}{\partial \, x_{ij}} D^{-1} $$

Besides, formula (72) tell us that

$$ \frac{\partial \,( X C X^T )}{\partial \, x_{ij}} = X C J^{ij} + J^{ji} C X^T $$

(where $J^{ij}$ is the "singleton matrix", with 1 in position $(i,j)$, zero elsewhere).

So

$$ \frac{\partial \, b^T (X C X^T)^{-1} b }{\partial \, x_{ij}} = - b^T D^{-1} (X C J^{ij} + J^{ji} C X^T ) D^{-1} b = -2 u^T X C J^{ij} u $$

where $u= D^{-1}b$ , and we've used the fact that $C$ is symmmetric -and hence also is $D$. Now formula (431) says $ u^T A J^{ij} B u = A^T u u^T B^T|_{i,j}$, hence the RHS is equal to

$$ -2 C X^T u u^T |_{i,j}$$

So

$$\frac{\partial \, b^T (X C X^T)^{-1} b }{\partial \, X} = -2 C X^T u u^T = - 2 C X^T (X C X^T)^{-1} b \, b^T (X C X^T)^{-1} $$