Calculating the gradient of a particular multivariate normal

derivativesmatrix-calculus

Suppose I have negative twice a log-likelihood of some multivariate normal data: $x_1, \ldots, x_n$. I assume the covariance matrix has a low-rank structure: $LL^\intercal + \Psi$, where $\Psi$ is diagonal, and $L$ is lower-triangular and low rank. The mean vector is $\mu$.

\begin{align}
f(\mu, L, \Psi) &= – 2 \log L(x_1, \ldots, x_n ; \mu, L, \Psi)\\
&= \text{const} + n\log \det(LL^\intercal + \Psi) + \sum_{i=1}^n (x_i – \mu)^\intercal (LL^\intercal + \Psi)^{-1} (x_i – \mu) \\
&= \text{const} + n\log \det(LL^\intercal + \Psi) + \text{trace}\left[ (LL^\intercal + \Psi)^{-1}\sum_{i=1}^n (x_i – \mu)(x_i – \mu)^\intercal\right]
\end{align}

I assume that because these matrices lie on a lower-dimensional space, the gradient can be written as

\begin{align*}
\nabla f(\mu, L, \Psi)
&= \begin{bmatrix}
\frac{\partial f}{\partial \mu} \\
\frac{\partial f}{\partial \text{vech} (L)} \\
\frac{\partial f}{\partial \text{diag} (\Psi)}
\end{bmatrix}
\end{align*}

Finding the first block of that vector is straightforward:

\begin{align*}
\frac{\partial f}{\partial \mu}
&= – 2 (LL^\intercal + \Psi)^{-1}n \bar{x} + 2 n (LL^\intercal + \Psi)^{-1}\mu
\end{align*}

However, I'm stuck on the other two pieces.

Best Answer

$ \def\bs{\boldsymbol} \def\o{{\tt1}} \def\P{{\Psi}} \def\M{P^{-1}} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vecc#1{\op{vec}\LR{#1}} \def\vech#1{\op{vech}\LR{#1}} \def\diag#1{\op{diag}\LR{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\qif{\quad\iff\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\Sk{\sum_{k=1}^n} $For typing convenience, define the all-ones vector $\o$ and the variables $$\eqalign{ P &= {\P+LL^T} \\ dP &= d\P+dL\,L^T+L\,dL^T \\ p &= \diag{\P} \qif \P = \Diag{p} \\ h &= \vech{L} = {Ev} \\ v &= \vecc{L} \:= \c{E^Th} \\ }$$ where $E$ is the Elimination Matrix and the last equality in $\c{\rm red}$ is only true if $L$ is lower triangular.

Replace the vector summation by a matrix whose columns are the {$x_k$} vectors $$\eqalign{ X &= {\bs[}\,x_1\;x_2\:\cdots\:x_n\,{\bs]} \\ M &= {\bs[}\;\mu\;\;\mu\;\;\cdots\;\;\mu\,{\bs]} \;= \mu\o^T \\ Y &= {M-X} \qiq dY = d\mu\,\o^T \\ Z &= \M Y \\ YY^T &= \Sk \LR{x_k-\mu}\LR{x_k-\mu}^T \\ }$$ and introduce the Frobenius product, which is a concise notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \|A\|^2_F \\ }$$ The properties of the underlying trace function allow the terms in a Frobenius product to be rearranged in many different ways, e.g. $$\eqalign{ A:B &= B:A \;=\; \vecc{A}:\vecc{B} \\ A:B &= A^T:B^T \\ C:\LR{AB} &= \LR{CB^T}:A = \LR{A^TC}:B \\ \Diag{a}:B &= a:\diag{B} \\ \\ }$$


Write the function using the above notation, then calculate its differential $$\eqalign{ f &= f_0 + n\log\det(P) + YY^T:\M \\ \\ df &= n\,d\LR{\log\det P} + YY^T:{d\M} + \M:d\LR{YY^T} \\ &= n\LR{P^{-1}:dP} \;+\; YY^T:\LR{-\M\,dP\,\M} \;+\; \M:\LR{dY\,Y^T+Y\,dY^T} \\ &= nP^{-1}:\c{dP} \;-\; ZZ^T:\c{dP} \;+\; 2\M Y:dY \\ &= \LR{nP^{-1}-ZZ^T}:\CLR{d\P+dL\,L^T+L\,dL^T} \;+\; 2\M Y:\LR{d\mu\,\o^T} \\ &= \LR{nP^{-1}-ZZ^T}:d\P \;+\; 2\LR{nP^{-1}-ZZ^T}L:dL \;+\; 2\M Y\o:d\mu \\ &= \diag{nP^{-1}-ZZ^T}:dp \;+\; 2\vecc{nP^{-1}L-ZZ^TL}:dv \;+\; 2\M Y\o:d\mu \\ \\ &= \diag{nP^{-1}-ZZ^T}:dp \;+\; 2\vecc{nP^{-1}L-ZZ^TL}:E^Tdh \;+\; 2\M Y\o:d\mu \\ &= \diag{nP^{-1}-ZZ^T}:dp \;+\; 2E\,\vecc{nP^{-1}L-ZZ^TL}:dh \;+\; 2\M Y\o:d\mu \\ }$$ Now isolate the respective gradients as $$\eqalign{ \grad{f}{p} &= \diag{nP^{-1}-ZZ^T} \\ \grad{f}{v} &= 2\vecc{nP^{-1}L-ZZ^TL} \\ \grad{f}{h} &= 2E\,\vecc{nP^{-1}L-ZZ^TL} \\ \grad{f}{\mu} &= 2\M Y\o \\ }$$ Note that $$E\,\vecc{nP^{-1}L-ZZ^TL}\ne\vech{nP^{-1}L-ZZ^TL}$$ because the matrix argument is not symmetric.