Solved – Derivative of covariance w.r.t. inverse covariance when elements are function of a vector

covariance-matrixmatrix-calculusmaximum likelihoodmultivariate normal distribution

I have this equation:

$$\nabla f^T x+ \nabla f^T \Sigma^{-1} (\Sigma \circ Q)x = -\frac{1}{2}\nabla f^T \Sigma^{-1} \nabla \tag{1}f$$

where $\nabla f,x$ are vectors, and

$$\nabla f_i =a_i – E[\nabla r_i]$$

($a_i$ being a scalar)

And $\Sigma$ is the covariance matrix corresponding to the vector $\nabla r$, the "$\circ$" indicates the hadamard product, and the elements of $Q$ are covariance elasticities w.r.t. $x$ defined as:

$$Q_{ij} = \frac{1}{\delta_{ij} + 1}\frac{x_i}{\sigma_{ij}}\frac{\partial \sigma_{ij}}{\partial x_i}$$

(where the factor involving the kronecker delta is there just to get rid of a 2 which appears in the diagonal terms.)

So, it must be kept in mind that each $i,j$ element of $\Sigma$ (and hence of $\Sigma^{-1}$ and $Q$) is a function of $x_i$ and $x_j$.

I want to differentiate equation 1 w.r.t. $\Sigma^{-1}$.

I start with the "trace trick".

$$\nabla f^T x + tr((\nabla f x^T) \Sigma^{-1} (\Sigma \circ E)) = -\frac{1}{2}tr(\nabla f \nabla f^T \Sigma^{-1})$$

Then, I invoke the trace's invariance under cyclic permutations to rewrite the second term on the LHS as follows:

$$\nabla f^T x + tr((\Sigma \circ E) (\nabla f x^T) \Sigma^{-1}) = -\frac{1}{2}tr(\nabla f \nabla f^T \Sigma^{-1})$$

Now everything is set up to apply the basic formula for differentiation of a trace with respect to a matrix (Eq. 100 in the matrix cookbook):

$$(\Sigma \circ E) (\nabla f x^T)=-\frac{1}{2}\nabla f \nabla f^T$$

My main question here is what about $\Sigma \circ Q$? I am basically praying that it does not participate in the differentiation, but that doesn't seem likely since all of these matrices are functions of the same vector $x$.

Secondly, note that, alternatively, you can write equation 1 as:

$$\nabla f^T x+ \nabla f^T \Sigma^{-1} z = -\frac{1}{2}\nabla f^T \Sigma^{-1} \nabla f$$

where $z=(\Sigma \circ Q)x$

Following the same steps as above (and again praying that $\frac{\partial (\Sigma \circ Q)}{\partial \Sigma^{-1}} = 0$) you then arrive at

$$\nabla f z^T=-\frac{1}{2}\nabla f \nabla f^T$$

Which seems to mean that $z=-\frac{1}{2}\nabla f$, and then plugging this back into Eq. 1 gives $\nabla f^Tx =0$?

Best Answer

Let's use a colon to denote the trace/Frobenius product $$A:B={\rm tr}(A^TB)$$ Define the variables $$\eqalign{ g &= \nabla f,\,\,\,\,\,\alpha=g^Tx,\,\,\,\,\,M = M^T = \Sigma^{-1} \cr \phi &= \alpha + (gx^T):(M(Q\circ M^{-1})) + \tfrac{1}{2}(gg^T):M \,\,\,= 0 \cr }$$ You wish to find the gradient $\frac{\partial\phi}{\partial M}$, start by finding its differential. $$\eqalign{ d\phi &= (gx^T):(dM(Q\circ M^{-1})) + (gx^T):(M(Q\circ dM^{-1})) + \tfrac{1}{2}(gg^T):dM \cr &= (gx^T(Q\circ M^{-1})^T):dM + (Q\circ(Mgx^T)):dM^{-1} + \tfrac{1}{2}(gg^T):dM \cr &= \Big(gx^T(Q\circ M^{-1})^T-M^{-1}\big(Q\circ(Mgx^T)\big)M^{-1}+\tfrac{1}{2}gg^T\Big):dM \cr }$$ Setting the gradient to zero $$\eqalign{ \frac{\partial\phi}{\partial M} &= gx^T(Q\circ M^{-1})^T-M^{-1}\big(Q\circ(Mgx^T)\big)M^{-1}+\tfrac{1}{2}gg^T = 0 }$$ leaves us with $$\eqalign{ -\tfrac{1}{2}gg^T &= gx^T(Q\circ M^{-1})^T-M^{-1}\big(Q\circ(Mgx^T)\big)M^{-1} \cr -\tfrac{1}{2}\nabla f\,\nabla f^T &= \nabla fz^T - \Sigma\Big(Q\circ\big(\Sigma^{-1}\nabla f\,x^T\big)\Big)\Sigma \cr }$$ So you derived the first two terms correctly, but missed the 3rd.