Derivative of matrix w.r.t. its own vectorized version

derivativesmatrix-calculusvectorization

I am unable to find what would be the derivative of a $m \times m$ real matrix $A$ with respect to $(\mathrm{vec}(A))^T$ (where $T$ is transpose and $\mathrm{vec}$ stacks the columns) without using tensors (i.e. remaining in 2d notation). I assume it would involve the Kronecker product, but is there a straightforward answer, or a convention?

Best Answer

$\def\bb{\mathbb}$ You don't want to use tensors, but they can be quite illuminating.

A matrix and its vectorized form are related by $$a = {\rm vec}(A) \quad\iff\quad A={\rm Mat}(a)$$ This can alternatively be written using third-order $\bb V$ectorization/$\bb M$atricization tensors and single/double dot products $$\eqalign{ a &= \bb V:A \quad\iff\quad &A=\bb M\cdot a \\ a &= A:\bb M \quad\iff\quad &A=a\cdot\bb V \\ }$$ The gradient of a matrix with respect to itself yields the fourth-order identity tensor $$\frac{\partial A}{\partial A} = \bb E\quad\implies\quad X =(\bb E:X)=(X:\bb E)$$ just as the gradient of a vector wrt itself yields the second-order identity matrix $$\frac{\partial a}{\partial a} = I\quad\implies\quad x =(I\cdot x)=(x\cdot I)$$ Using these tensors, one can write the vector-by-vector, matrix-by-vector, and vector-by-matrix gradients in terms of the matrix-by-matrix gradient. $$\eqalign{ \frac{\partial a}{\partial b} &= \frac{\partial(\bb V:A)}{\partial(B:\bb M)} &= \bb V:&\left(\frac{\partial A}{\partial B}\right):\bb M \\ \frac{\partial A}{\partial b} &= \frac{\partial A}{\partial(B:\bb M)} &= &\left(\frac{\partial A}{\partial B}\right):\bb M \\ \frac{\partial a}{\partial B} &= \frac{\partial(\bb V:A)}{\partial B} &= \bb V:&\left(\frac{\partial A}{\partial B}\right) \\ }$$ Combining the above ideas yields $$\eqalign{ \frac{\partial A}{\partial a} &= \left(\frac{\partial A}{\partial A}\right):\bb M &= (\bb E):\bb M = \bb M \\ \frac{\partial a}{\partial A} &= \bb V:\left(\frac{\partial A}{\partial A}\right) &= \bb V:(\bb E) = \bb V \\ }$$ So how are these tensors defined?

For matrices in $\;\bb R^{m\times n}\;$ they are $$\eqalign{ \bb V_{\ell jk} &= \begin{cases} 1\quad{\rm if}\;\;\ell+m=j+mk \\ 0\quad{\rm otherwise} \\ \end{cases} \\ \bb M_{jk\ell} &= \begin{cases} 1\quad{\rm if}\;\;j-1={\rm div}(\ell-1,n)\;\;\&\;\;k-1={\rm mod}(\ell-1,n) \\ 0\quad{\rm otherwise} \\ \end{cases} \\ \bb E_{jkpq} &= \begin{cases} 1\quad{\rm if}\;\;j=p\;\;\&\;\;k=q \\ 0\quad{\rm otherwise} \\ \end{cases} \\ I_{jp} &= \begin{cases} 1\quad{\rm if}\;\;j=p \\ 0\quad{\rm otherwise} \\ \end{cases} \\ }$$ and the index ranges are $$\eqalign{ 1&\le\; j,p \;&\le m &\qquad\big({\rm the}\,row\;{\rm index}\big) \\ 1&\le\; k,q \;&\le n &\qquad\big({\rm the}\,column\;{\rm index}\big) \\ 1&\le\; \ell \;&\le m\!\cdot\!n &\qquad\big({\rm the}\,long\;{\rm index}\big) \\ }$$ In some sense the third-order tensors are the more fundamental quantities, since given $\big(\bb M,\bb V\big)$ the remaining tensors can be calculated as $$\eqalign{ \bb E &= \bb M\cdot\bb V \qquad&\big({\rm contract\,over\,long\,index}\big) \\ I &= \bb V:\bb M \qquad&\big({\rm contract\,over\,row/column\,indexes}\big) \\ }$$ Similarly, $I$ is more fundamental than $\bb E$ since $$\eqalign{ {\bb E}_{jkpq} &= I_{jp}I_{kq} \\ }$$ Also note that you only need to calculate one of the fundamental tensors, since they are equal after a cyclic rotation of the indexes $$\eqalign{ {\bb M}_{jk \ell} &\doteq {\bb V}_{\ell jk} \\ }$$ The definition of $\bb V$ being somewhat simpler, is usually the tensor which is chosen.

Related Question