Derivative of vectorized function wrt to a Cholesky decompositiion

cholesky decompositionderivativesjacobianmatrix-calculusvectorization

Let $\Sigma$ be a symmetric, positive definite $p\times p$ covariance matrix, and let $f(\Sigma)$ be it's Cholesky factor. That is, $f(\Sigma)$ is a lower triangular $p\times p$ matrix such that $\Sigma = f(\Sigma) f(\Sigma)^{\top}$. Further let $\Lambda := \operatorname{diag}(f(\Sigma))$ be a diagonal matrix holding the diagonal elements of $f(\Sigma)$ on its diagonal, i.e. the standard deviations given by $\Sigma$, and finally, let $P = \Lambda^{-1} \Sigma \Lambda^{-1}$ denote the correlation matrix.

I am wondering if, with $\mathcal{P} := P – I_p + \Lambda$, the derivative
$$
\frac{\mathrm{d}\operatorname{vec}\left( \mathcal{P} \right)}{\mathrm{d} \operatorname{vec} \left( f(\Sigma) \right)}
$$

is known, where $\operatorname{vec}$ is the vectorization function and $I_p$ the $p$-dimensional identity matrix.

I found questions answering related questions, as for example here and here and here; however due to my limited knowledge of matrix calculus I don't know how to combine these sources nor if a closed form solution exists.

Best Answer

Let's use a naming convention where matrices and vectors are denoted by upper and lower case Latin letters, respectively. Further, the symbol $\odot$ will denote the Hadamard product and $\otimes$ the Kronecker product.

For ease of typing, use $\{S,A,P\}$ instead of $\,\{\Sigma,{\large\Lambda},{\cal P}\}\,$ and $\,X=f(\Sigma)\,$.

Then rewrite the problem using these conventions. $$\eqalign{ S &= XX^T,\quad A = I\odot X,\quad V=A^{-1} \\ P &= VSV + A - I \\ }$$ Each of these matrices (except for $X$) is symmetric, and $(A,V,I)$ are diagonal.

Apply the vec operation ($K$ denotes the Commutation matrix) $$\eqalign{ y &= {\rm vec}(I) \\ x &= {\rm vec}(X) \quad\implies\quad{\rm vec}(X^T) &\doteq Kx \\ a &= y \odot x \;=\; {\rm Diag}(y)\,x &\doteq Yx \\ da &= Y\,dx \\ \\ s &= (I\otimes X)Kx \;=\; (X\otimes I)\,x \\ ds &= \Big((I\otimes X)K+(X\otimes I)\Big)\,dx &\doteq N\,dx\\ \\ p &= (V\otimes V)s + a-y &\doteq Bs + a-y \\ &= (VS\otimes I)v + a-y &\doteq Hv + a-y \\ &= (I\otimes VS)v + a-y &\doteq Jv + a-y \\ }$$ Finally, calculate the differentials of $v$ and $p$ $$\eqalign{ dv &= {\rm vec}(-V\,dA\,V) = -(V\otimes V)\,da \\ &= -B\,da \;=\; -BY\,dx \\ \\ dp &= da + B\,ds + H\,dv + J\,dv \\ &= Y\,dx + BN\,dx - (H+J)BY\,dx \\ &= \Big(Y + BN - (H+J)BY\Big)\,dx \\ }$$ and the gradient with respect to $x$ $$\eqalign{ \frac{\partial p}{\partial x} &= Y + BN - (H+J)BY \\ &= Y + (V\otimes V)\Big((I\otimes X)K+(X\otimes I)\Big) - (VS\otimes I + I\otimes VS)(V\otimes V)Y \\ &= Y + (V\otimes VX)K+(VX\otimes V) - \big(VSV\otimes V + V\otimes VSV\big)Y \\ }$$

Related Question