Hessian matrix of $\Lambda \mapsto y’ (I + X\Lambda X’)^{-1}y$

hessian-matrixkronecker productlinear algebramatricesmatrix-calculus

I have $$f=y'M^{-1}y$$ where $$M = I + X\Lambda X'$$ for $y \in \mathbb{R}^n$, $X\in \mathbb{R}^{n\times p}$, and $\Lambda$ is a $p\times p$ symmetric positive-definite matrix. I'm trying to compute the Hessian matrix of this quadratic form with respect to the half-vectorization $\mathrm{vech}(\Lambda)$, i.e., $$\partial^2 f/(\partial \mathrm{vech}(\Lambda)\partial \mathrm{vech}(\Lambda)')$$ Here's what I've got so far ($D_p$ is the $p^2\times \frac{1}{2}p(p+1)$ duplication matrix):

$$
\begin{align}
\mathsf{d}f &= -y'M^{-1}(\mathsf{d}M)M^{-1}y \\ &= -y'M^{-1}X(\mathsf{d}\Lambda)X'M^{-1}y \\\\
&= -(y'M^{-1}X \otimes y'M^{-1}X)\mathsf{d}\mathrm{vec}(\Lambda)\\\\
&=-(y'M^{-1}X \otimes y'M^{-1}X)D_p\mathsf{d}\mathrm{vech}(\Lambda),\\\\
\frac{\partial f}{\partial \mathrm{vech}(\Lambda)'} &= -D_p'\left(X'M^{-1}y \otimes X'M^{-1}y \right).
\end{align}
$$

Redefine $f = -D_p'\left(X'M^{-1}y \otimes X'M^{-1}y \right)$. Then,

$$
\begin{align}
\mathsf{d}f &= -D_p'\mathsf{d}\left(X'M^{-1}y \otimes X'M^{-1}y \right) \\\\
&= D_p'\left\{X'M^{-1}(\mathsf{d}M)M^{-1}y\otimes X'M^{-1}y + X'M^{-1}y \otimes X'M^{-1}(\mathsf{d}M)M^{-1}y \right\}\\\\
&= D_p'\left\{X'M^{-1}X(\mathsf{d}\Lambda)X'M^{-1}y \otimes X'M^{-1}y + X'M^{-1}y \otimes X'M^{-1}X(\mathsf{d}\Lambda)X'M^{-1}y \right\}\\\\
&= D_p'\left\{\left(y'M^{-1}X \otimes X'M^{-1}X \right)\mathsf{d}\mathrm{vec}(\Lambda) \otimes X'M^{-1}y \right. \\\\
&\left. + X'M^{-1}y \otimes \left(y'M^{-1}X' \otimes X'M^{-1}X \right)\mathsf{d}\mathrm{vec}(\Lambda) \right\}.
\end{align}
$$

How do I take $\mathrm{vec}(\Lambda)$ out?

To simplify the question, how do I take $\mathsf{d}\mathrm{vec}(\Lambda)$ out of the following:
$$
\mathsf{d}f = \left[\left\{(A \otimes B)\mathsf{d}\mathrm{vec}(\Lambda)\right\} \otimes C\right\}+\left\{C \otimes \left\{(A \otimes B)\mathsf{d}\mathrm{vec}(\Lambda)\right\}\right]?
$$

Update

Using greg's answer, I now have
$$
\begin{align}
X'M^{-1}y\otimes (y'M^{-1}X \otimes X'M^{-1}X)\mathsf{d}\mathrm{vec}\Lambda &= (X'M^{-1}y\otimes I_p)(y'M^{-1}X \otimes X'M^{-1}X)\mathsf{d}\mathrm{vec}\Lambda \\\\
&= (X'M^{-1}yy'M^{-1}X \otimes X'M^{-1}X)\mathsf{d}\mathrm{vec}\Lambda,
\end{align}
$$

which involves a symmetric matrix, but
$$
\begin{align}
(y'M^{-1}X \otimes X'M^{-1}X)\mathsf{d}\mathrm{vec}\Lambda \otimes X'M^{-1}y &= (I_p \otimes X'M^{-1}y)(y'M^{-1}X \otimes X'M^{-1}X)\mathsf{d}\mathrm{vec}\Lambda,
\end{align}
$$

which I cannot simplify.

Best Answer

$ \def\o{{\tt1}} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vecc#1{\op{vec}\LR{#1}} \def\vech#1{\op{vech}\LR{#1}} \def\size#1{\op{size}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $Given the following matrix sizes $$\eqalign{ m,n &= \size{A} \\ p,q &= \size{B} \\ }$$ The formulas for vectorizing their Kronecker products are $$\eqalign{ {\rm vec}(A\otimes B) &= \Big(I_n\otimes K_{q,m}\otimes I_p\Big)\,\Big({\rm vec}(A)\otimes I_q\otimes I_p\Big)\:{\rm vec}(B) \\ {\rm vec}(B\otimes A) &= \Big(I_q\otimes K_{n,p}\otimes I_m\Big)\,\Big(I_q\otimes I_p\otimes{\rm vec}(A)\Big)\:{\rm vec}(B) \\ }$$ where $K_{n,p}$ is a Commutation Matrix and $I_p$ is an identity matrix.

Your calculations up to this point look good, so I'm confident that you can take it from here.

Update

In the case that $A,B$ are column vectors $$\eqalign{ m,\o &= \size{a},\qquad p,\o &= \size{b} \\ }$$ the formulas simplify to $$\eqalign{ \vecc{a\otimes b} &= \Big(a\otimes I_p\Big)\,b \\ {\rm vec}(b\otimes a) &= \Big(I_p\otimes a\Big)\,b \\ }$$ If $H$ is the Hessian wrt the vec() operation and $K$ is the Hessian wrt the vech() operation, then they are related by $$K = D^THD$$ Both Hessians are symmetric (assuming the calculations were done correctly).

Update 2

Here's how I would arrange the calculations.

First, introduce a convention which uses uppercase letters for matrix variables, lowercase for vectors, and Greek letters for scalars.

Second, introduce the extremely versatile Frobenius $(:)$ product $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \frob{A}^2 \qquad \{ {\rm Frobenius\;norm} \} \\ A:B &= B:A \;=\; B^T:A^T \\ C:\LR{AB} &= \LR{CB^T}:A \;=\; \LR{A^TC}:B \\ }$$ Third, define the following variables for typing convenience $$\eqalign{ Y &= yy^T \\ A &= \Lambda &\qiq a = \vecc{A} \\ M &= I+XAX^T &\qiq dM = X\:dA\:X^T \\ W &= M^{-1} &\qiq dW = -W\:dM\:W \\ F &= -X^TWX \\ G &= -X^T WYW X \\ H &= {G\otimes F + F\otimes G} \\ }$$ $\sf NB:$ Except for $X$, all of these matrix variables are symmetric.

Now calculate the gradient of the scalar function $$\eqalign{ \phi &= Y:W \\ d\phi &= Y:dW \\ &= -Y:\LR{WX\:dA\:X^TW} \qquad\qquad\qquad\qquad\qquad\qquad \\ &= -\LR{X^TWYWX}:dA \\ &= G:dA \\ &= g:da \\ \grad{\phi}{a} &= g \\ }$$ Then the Hessian $$\eqalign{ g &= -\vecc{X^T\c{W}Y\c{W}X} \\ dg &= -\vecc{X^T\:\c{dW}\:YWX + X^TWY\:\c{dW}\:X} \\ &= \vecc{X^T\c{WX\:dA\:X^TW}YWX + X^TWY\c{WX\:dA\:X^TW}X} \\ &= \vecc{F\:\c{dA}\:G + G\:\c{dA}\:F} \\ &= \LR{G\otimes F + F\otimes G}\c{da} \\ &= H\:da \\ \grad{g}{a} &= H \\ }$$ The above gradient $(g)$ and Hessian $(H)$ were calculated with respect to $\vecc{A}.\:$ Calculating the gradient $(q)$ and Hessian $(K)$ wrt $\vech{A}$ is a simple matter of multiplying by the Duplication matrix $(D)$ $$\eqalign{ b &= \vech{A} \qiq a = Db \\ d\phi &= g:da = g:\LR{D\:db} = \LR{D^Tg}:db \\ \grad{\phi}{b} &= D^Tg \;\doteq\; q \\ \\ dq &= D^Tdg = D^TH\:da = D^THD\:db \\ \grad{q}{b} &= D^THD \;\doteq\; K \\ }$$

Related Question