Derivation of $\frac{\partial }{\partial \Sigma} \left(-\frac{1}{2}\log(\det(M\otimes K +I_T \otimes \Sigma)) \right)$

matrix-calculusproof-verification

I've done below a derivation, and I'm wondering if it's correct. If you help me with both derivations, I'll throw in some bonus/bounty points.

For ease of notation let's define $U_{M\Sigma}=M\otimes K +I_T \otimes \Sigma$.

Also, $U_{M\Sigma}$ is symmetric and positive definite.

We know that

\begin{align*}
d\left(-\frac{1}{2}\log(\det(M\otimes K +I_T \otimes \Sigma))\right)&= Tr(-\frac{1}{2}U_{M\Sigma}^{-1}dU_{M\Sigma})\\
&=Tr(-\frac{1}{2}\text{vec}(U_{M\Sigma}^{-1})^\intercal \text{vec}(I_T\otimes d\Sigma))
\end{align*}

and we notice that

\begin{align*}
\text{vec}(I_T\otimes d\Sigma)&=\left[
\begin{array}{c}
\text{vec}(e_1\otimes d\Sigma) \\
\vdots\\
vec(e_T\otimes d\Sigma)
\end{array}
\right]=\left[
\begin{array}{c}
((e_1\otimes I_D)\otimes I_D) \text{vec}(d\Sigma) \\
\vdots\\
((e_1\otimes I_D)\otimes I_D) \text{vec}(d\Sigma)
\end{array}
\right]\\
&=(\text{vec}(I_T)\otimes I_{D^2})\ \text{vec}(d\Sigma)
\end{align*}

Therefore we have
\begin{align*}
d\left(-\frac{1}{2}\log(\det(M\otimes K +I_T \otimes \Sigma))\right)&= Tr(U_{M\Sigma}^{-1}dU_{M\Sigma})\\
&=Tr(-\frac{1}{2}vec(U_{M\Sigma}^{-1})^\intercal (vec(I_T)\otimes I_{D^2}) vec(d\Sigma))\\
&=Tr(-\frac{1}{2}(\Gamma_{\Sigma})^\intercal d\Sigma)
\end{align*}

Such that $\Gamma_{\Sigma}$ is defined by $vec(\Gamma_{\Sigma})^\intercal=vec(U_{M\Sigma}^{-1})^\intercal (vec(I_T)\otimes I_{D^2})$, and $\frac{\partial}{\partial \Sigma}\left(-\frac{1}{2}\log(\det(M\otimes K +I_T \otimes \Sigma))\right)=-\frac{1}{2}\Gamma_{\Sigma}$.

Extra Points for help with the derivation below:

\begin{align*}
d(-v^\intercal (M\otimes K+I_T\otimes \Sigma)^{-1}v)&=-v^\intercal (-U_{M\Sigma})^{-1}(dU_{M\Sigma})U_{M\Sigma}^{-1}v\\
&=v^\intercal U_{M\Sigma}^{-1}(I_T\otimes d\Sigma)U_{M\Sigma}^{-1}v\\
&=v_{\Sigma}^\intercal U_{M\Sigma}^{-1}vec(d\Sigma W_{M\Sigma} I_T)
\end{align*}

where $W_{M\Sigma}$ is defined by being conformable and $vec(W_{M\Sigma}) = U_{M\Sigma}^{-1}v$. Therefore, we have
\begin{align*}
d(-v^\intercal (M\otimes K+I_T\otimes \Sigma)^{-1}v)&=Tr(W_{M\Sigma}^\intercal d\Sigma W_{M\Sigma} I_T)\\
&=Tr(W_{M\Sigma} W_{M\Sigma}^\intercal d\Sigma)
\end{align*}

And we have $\frac{\partial}{\partial \Sigma}(-v^\intercal (M\otimes K+I_T\otimes \Sigma)^{-1}v)=W_{M\Sigma}^\intercal W_{M\Sigma}$.

Now if we define $L= -\frac{1}{2}\log(\det(U_{M\Sigma}) -v^\intercal (U_{M\Sigma})^{-1}v$
$$\frac{\partial L}{\partial \Sigma}=-\frac{1}{2}\Gamma_{\Sigma}+W_{M\Sigma}^\intercal W_{M\Sigma}$$.

Also, if $\Sigma=\text{Diag}(e^{\tilde{\sigma^2}_i})$

$$\frac{\partial L}{\partial \Sigma}\frac{\partial \Sigma}{\partial (\tilde{\sigma_i^2})} = \text{diag}\left((\frac{\partial}{\partial \Sigma}L)^\intercal \text{Diag}(e^{\tilde{\sigma_i^2}})\right).$$

Best Answer

Instead of vectorization, take advantage of the block structure of your matrix by introducing a block version of the diag() operator $$B_k={\rm bldiag}(M,k,n)$$ which extracts the $k^{th}$ block along the diagonal of $M\,$ where $1\le k\le n.$
The dimension of the block is $\tfrac{1}{n}$ of the corresponding dimension of the parent matrix.

Also note that for any value of $k,\,\,{\rm bldiag}\big((I_T\otimes \Sigma),k,T\big) = \Sigma,$
and further, these are the only non-zero blocks in the entire matrix.

Back to your specific problem. To reduce the clutter let's drop most subscripts, ignore the scalar factors, rename the variable $\Sigma\rightarrow S$ so it's not confused with summation, and rename $T\rightarrow n$ so as not to confuse it with the transpose operation. $$\eqalign{ U &= M\otimes K + I_n\otimes S \cr \phi &= \log\det U \cr d\phi &= d{\,\rm tr}(\log U) \cr &= U^{-T}:dU \cr &= U^{-T}:(I_n\otimes dS) \cr &= \sum_{k=1}^n{\rm bldiag}\big(U^{-T},k,n\big):{\rm bldiag}\big((I_n\otimes dS),k,n\big) \cr &= \sum_{k=1}^nB_k:dS \cr &= B:dS \cr \frac{\partial\phi}{\partial S} &= B \cr\cr }$$ The second problem is quite similar. $$\eqalign{ W &= vv^T \cr \psi &= -W:U^{-1} \cr d\psi &= W:U^{-1}\,dU\,U^{-1} \cr &= U^{-T}WU^{-T}:dU \cr &= \sum_{k=1}^n{\rm bldiag}\big(U^{-T}WU^{-T},k,n\big):dS \cr &= C:dS \cr \frac{\partial\psi}{\partial S} &= C \cr\cr }$$ For coding purposes, assume you have $$\eqalign{ A&\in{\mathbb R}^{pm\times pn} \cr }$$ and you wish to calculate the sum of the block diagonals, i.e. $$\eqalign{ B &= \sum_{k=1}^p{\rm bldiag}(A,k,p) \quad\in {\mathbb R}^{m\times n} \cr }$$ In almost all programming languages you can access a sub-matrix using index ranges, so you don't need to waste RAM creating vectors and matrices to hold intermediate results.

For example, in Julia (or Matlab) you can write

B = zeros(m,n)
for k = 1:p
  B += A[k*m-m+1:k*m, k*n-n+1:k*n]
end

So this single for-loop will calculate the gradients shown above.


Update
Here is a little Julia (v $0.6$) script to check the various "vec-kronecker" expansions.

s,t = 2,3; dS = 4*rand(s,s); dS += dS'; It = eye(t); Is = eye(s);
Iss = kron(Is,Is); M = kron(Is,It[:,1]);
for k = 2:t; M = [ M;  kron(Is,It[:,k]) ]; end
M = kron(M,Is);

x = kron(vec(It), Iss)*vec(dS);
y = vec(kron(It,dS));
z = M*vec(dS);

println( "x = $(x[1:9])\ny = $(y[1:9])\nz = $(z[1:9])" )
x = [3.07674, 5.3285, 5.3285, 3.51476, 0.0, 0.0, 0.0, 0.0, 0.0]
y = [3.07674, 5.3285, 0.0, 0.0, 0.0, 0.0, 5.3285, 3.51476, 0.0]
z = [3.07674, 5.3285, 0.0, 0.0, 0.0, 0.0, 5.3285, 3.51476, 0.0]

In symbols $$\eqalign{ M &= \pmatrix{I_s\otimes e_1\cr I_s\otimes e_2\cr\ldots\cr I_s\otimes e_t} \otimes I_s \cr x &= \big({\rm vec}(I_t)\otimes I_{s^2}\big)\,{\rm vec}(dS) \cr y &= {\rm vec}(I_t\otimes dS) \cr z &= M\,{\rm vec}(dS) \cr x &\ne y = z \cr }$$

Related Question