[Math] Differentiate wrt Cholesky decomposition

matricesmatrix decompositionmatrix-calculus

I wish to find the most likely estimator of the precision matrix (inverse covariance matrix). One option is to maximise the following:
$$
f(\Theta) = \frac{N}{2}\log|\Theta|-\sum_i \mathbf{x}_i^T\Theta\mathbf{x}_i
$$
(assume that the mean is zero without loss of generality). $\mathbf{x}_i$'s are constant.

I know how to differentiate the above by exploiting the fact that $\mathbf{x}_i^T\Theta\mathbf{x}_i=Tr(\Theta\mathbf{x}_i\mathbf{x}_i^T)$. However, if I pose the question as instead to optimise the cholesky decomposition $L$ where $LL^T=\Theta$,
$$
f(L) = {N}\log|L|-\sum_i \mathbf{x}_i^TLL^T\mathbf{x}_i
$$
what is $\frac{\partial f(L)}{\partial L}$? It's really the second term that I am struggling with.

Best Answer

To reduce clutter, define the symmetric matrix $$X = \sum_i x_i x_i^T$$ It is also handy to know that $$\eqalign{ \log\det L &= {\rm tr}\log L \cr }$$

Now use the Frobenius Inner (:) Product to write the function and its differential as $$\eqalign{ f &= N\,{\rm tr}\log L - X:LL^T \cr\cr df &= NL^{-T}:dL - X:2\,{\rm sym}(dL\,L^T) \cr &= NL^{-T}:dL - 2\,{\rm sym}(X):dL\,L^T \cr &= (NL^{-T} - 2\,XL):dL \cr }$$ Since $df = (\frac{\partial f}{\partial L}:dL),\,$ the gradient must be $$\eqalign{ \frac{\partial f}{\partial L} &= NL^{-T} - 2\,XL \cr }$$