Matrix Differentiation of Kronecker Product

chain rulekronecker productmatrix-calculus

I have a question about differentiating an expression which has multiple kronecker products.

I have the following objective function I would like to differentiate with respect to $\mathbf{Q}$:
\begin{equation*}
\lVert\mathbf{y}-\mathbf{A}(\mathbf{Q}\otimes\mathbf{Q}\otimes\mathbf{Q}\otimes\mathbf{Q})\mathbf{x}\rVert^2_2
\end{equation*}

where $\mathbf{y}\in\mathbb{R}^m$, $\mathbf{A}\in\mathbb{R}^{m\times K^4}$, $\mathbf{Q}\in\mathbb{R}^{K\times K}$ and $\mathbf{x}\in\mathbb{R}^{K^4}$. I am confused with how the chain rule works with respect to matrix differentiation. This is how I proceeded:

Let $ f=\lVert\mathbf{y}-\mathbf{A}(\mathbf{Q}\otimes\mathbf{Q}\otimes\mathbf{Q}\otimes\mathbf{Q})\mathbf{x}\rVert^2_2$
and $\mathbf{B}=\mathbf{Q}\otimes\mathbf{Q}\otimes\mathbf{Q}\otimes\mathbf{Q}$. Therefore $\frac{df}{d\mathbf{Q}}=\frac{df}{d\mathbf{B}}\frac{d\mathbf{B}}{d\mathbf{Q}}$

When I calculate $\frac{df}{d\mathbf{B}}=\mathbf{A}^T(\mathbf{y}-\mathbf{ABx})\mathbf{x}^T$ I gain a $\mathbb{R}^{K^4\times K^4}$ matrix not a $\mathbb{R}^{K\times K}$ matrix that I am hoping for.
Therefore I am using the chain rule wrong because of the change in dimensions i.e scalar to matrix.

Thank you for your help in advance.

Best Answer

Short answer: the derivative of $Q\otimes Q\otimes Q\otimes Q$ with respect to $Q$ is a mess, at first sight...

Let's start simple. Let $Q$ be a $K\times K$ matrix with entries $Q_{ij}$ and let $E^{ab}$ be the $K\times K$ matrix with all $0$ entries, except the entry $(a,b)$ which is $1$; in other words, $(E^{ab})_{ij} = \delta_a^i\delta_b^j$.

Then I claim that $$ \frac{\partial(Q\otimes Q)}{\partial Q_{ij}} = E^{ij}\otimes Q+Q\otimes E^{ij} . $$ I leave it to you to see why, because trying to write out the involved matrices will probably crash the entire Stack Exchange network...

Jokes aside, this is really immediate to see: just write $Q\times Q$ as in the first formula of the definition and think which elements are affected by $Q_{ij}$. There is the entire $(i,j)$th block, so you get $E^{ij}\otimes Q$, but there is also the $(i,j)$th entry in each block, which gives you $Q\otimes E^{ij}$.

Now, if $A$ and $B$ are matrices which are functions of $Q$, by the same reasoning you get $$ \frac{\partial(A\otimes B)}{\partial Q_{ij}} = \frac{\partial A}{\partial Q_{ij}}\otimes B + A\otimes \frac{\partial B}{\partial Q_{ij}} . $$

So you can iterate for instance $$ \begin{split} \frac{\partial (Q\otimes Q\otimes Q)}{\partial Q_{ij}} &= \frac{\partial(Q\otimes Q)}{\partial Q_{ij}}\otimes Q + (Q\otimes Q)\otimes \frac{\partial Q}{\partial Q_{ij}} \\ &= (E^{ij}\otimes Q+Q\otimes E^{ij})\otimes Q + (Q\otimes Q)\otimes E^{ij} \\ &= E^{ij}\otimes Q\otimes Q + Q\otimes E^{ij}\otimes Q + Q\otimes Q\otimes E^{ij}. \end{split} $$

Now you can prove by induction that $$ \frac{\partial \bigl(\bigotimes_{n=1}^N Q\bigr)}{\partial Q_{ij}} = \sum_{n=1}^N \left(\bigotimes_{h=1}^{n-1} Q\right) \otimes E^{ij} \otimes \left(\bigotimes_{h=n+1}^{N} Q\right). $$

Written more concisely, $$ \frac{\partial Q^{\otimes N}}{\partial Q_{ij}} = \sum_{n=1}^N Q^{\otimes (n-1)}\otimes E^{ij} \otimes Q^{\otimes (N-n)} . $$