[Math] How to get a derivative with respect to scalar for $f(a)=y^T(aI+B)^{-1}y$

derivativesmatrix-calculus

I have an equation which I simplified to following form:
$$f(a)=y^T(aI+B)^{-1}y$$
where $a$ is scalar, $y$ is a column Nx1 vector, $I$ is an identity matrix of size NxN, and $B$ is any symmetric matrix of size NxN such that $aI+B$ is invertible.

I want to find a gradient with respect to $a$

Function $f(a)$ takes a scalar and returns a scalar, thus derivative with respect to $a$ should also return scalar. I wanted then apply chain rule by performing following decomposition:

$$g(A) = y^TA^{-1}y$$
$$h(a) = aI+B$$

hoping that:

$$\frac{\partial f}{\partial a} = \frac{\partial g}{\partial h}\frac{\partial h}{\partial a}$$
and then using known identity to deal with $y^TA^{-1}y$
$$\frac{\partial g}{\partial h}=-A^{-T}yy^TA^{-T}$$

But even without proceeding, with this being one of factors I wont get a scalar.

Potentially I am wrongly applying chain rule in this setting? Could you please explain how to fix my approach and calculate the gradient?

Best Answer

Instead of the chain rule use differentials. $$\eqalign{ A &= Ia+B \cr dA&=I\,da \cr dA^{-1} &= -A^{-1}\,dA\,A^{-1} = -A^{-2}\,da \cr \cr df &= y^T\,dA^{-1}\,y = -y^TA^{-2}y\,\,da \cr \cr \frac{df}{da} &= -y^TA^{-2}y \cr }$$