I have an equation which I simplified to following form:
$$f(a)=y^T(aI+B)^{-1}y$$
where $a$ is scalar, $y$ is a column Nx1 vector, $I$ is an identity matrix of size NxN, and $B$ is any symmetric matrix of size NxN such that $aI+B$ is invertible.
I want to find a gradient with respect to $a$
Function $f(a)$ takes a scalar and returns a scalar, thus derivative with respect to $a$ should also return scalar. I wanted then apply chain rule by performing following decomposition:
$$g(A) = y^TA^{-1}y$$
$$h(a) = aI+B$$
hoping that:
$$\frac{\partial f}{\partial a} = \frac{\partial g}{\partial h}\frac{\partial h}{\partial a}$$
and then using known identity to deal with $y^TA^{-1}y$
$$\frac{\partial g}{\partial h}=-A^{-T}yy^TA^{-T}$$
But even without proceeding, with this being one of factors I wont get a scalar.
Potentially I am wrongly applying chain rule in this setting? Could you please explain how to fix my approach and calculate the gradient?
Best Answer
Instead of the chain rule use differentials. $$\eqalign{ A &= Ia+B \cr dA&=I\,da \cr dA^{-1} &= -A^{-1}\,dA\,A^{-1} = -A^{-2}\,da \cr \cr df &= y^T\,dA^{-1}\,y = -y^TA^{-2}y\,\,da \cr \cr \frac{df}{da} &= -y^TA^{-2}y \cr }$$