About standard vectorization of a matrix and its derivative

derivativesmatricesmatrix-calculusvectorization

I read about this notation: if $X \in \mathbb{R}^{d\times d}$ then $X^b \in \mathbb{R}^{d^2}$ is the standard vectorization of $X$. I searched the term "standard vectorization" and only found results about vectorization, for example this Wiki page. So the first part of the question is simply a clarification of the notation. For instance, if $X = \begin{bmatrix} a & b \\ c & d\end{bmatrix}$ then $X^b = [a\ b\ c\ d]$ or $[a\ c\ b\ d]$ or even something completely different?

Assuming that it's one of the two first, I don't understand how it is useful. Isn't just another way to view the same thing? Could you provide some examples that demonstrate its purpose?

Finally, as stated in the question, I am interesting in understanding how to calculate the derivative wrt the vectorization. For instance, let $f: \mathbb{R}^{d\times d} \to \mathbb{R}^{d\times d}$ be a function. What is $\frac{\partial f}{\partial X^b}$? I know how to calculate $\frac{\partial f}{\partial X}$ so my first try was to use the chain rule:

$$\frac{\partial f}{\partial X^b} = \frac{\partial f}{\partial X}\frac{\partial X}{\partial X^b}$$

I found this question which seems relevant but I think that something is wrong because if $\frac{\partial X}{\partial X^b}$ is $[I_d I_d \dots I_d]$ then the dimensions don't match to perform matrix multiplication.

Any help would be much appreciated.

Best Answer

Given a matrix-valued function $F$ of a matrix-valued variable $X$, e.g. $$F = X^3$$ rather than jumping straight to the chain rule consider its differential $$dF = dX\,X^2 + X\,dX\,X + X^2\,dX$$ Use vectorization on this differential expression to obtain $${\rm vec}(dF) = \Big((X^2)^T\otimes I\;+\;X^T\otimes X\;+\;I\otimes X^2\Big)\,{\rm vec}(dX) $$ Then the gradient of the vectorized function is $$\eqalign{ \frac{\partial\,{\rm vec}(F)}{\partial\,{\rm vec}(X)} &= (X^2)^T\otimes I\;+\;X^T\otimes X\;+\;I\otimes X^2 }$$ Index notation is another option $$\eqalign{ dF_{i\ell} &= dX_{ij}\,X^2_{j\ell} + X_{ij}\,dX_{jk}\,X_{k\ell} + X^2_{ik}\,dX_{k\ell} \\ \frac{\partial F_{i\ell}}{\partial X_{pq}} &= (\delta_{ip}\delta_{jq})\,X^2_{j\ell} + X_{ij}\,(\delta_{jp}\delta_{kq})\,X_{k\ell} + X^2_{ik}\,(\delta_{kp}\delta_{\ell q}) \\ &= \delta_{ip}X^2_{q\ell} + X_{ip}X_{q\ell} + X^2_{ip}\delta_{\ell q} \\ }$$