I have that ${y}={A}\otimes{A}{x}$ where ${A}\in\mathbb{R}^{n\times n}$ and ${x}\in\mathbb{R}^{n^2}$. I want to find $\frac{d{y}}{d{A}}$ in matrix (or tensor) form. I have looked at other questions on here where the solution uses the Magnus-Neudecker technique of vectorising each side. The issue is that my term already contains a kronecker product so the identity $\text{vec}(ABC)=(C^{\mathrm {T} }\otimes A)\operatorname {vec} (B)$ that is used frequently in the various solutions isn't useful in this case. Any help would be much appreciated.
Vector – Matrix Differentiation that includes the Kronecker product
kronecker productlinear algebramatricesmatrix-calculus
Related Solutions
This took me a while to figure out. What finally helped me was the supplemental materials of this paper.
To summarize very briefly, what you want to do is to treat the vector $x$ as having a multi-dimensional index $x_{i_1, i_2, ..., i_n}$. Then you can sequentially multiply $x$ by the matrices $A_1$, $A_2$, etc. after an appropriate permutation of the indices. See discussion in Fino & Algazi (1976). Overall, this gives substantial savings in terms of memory and computational complexity. You save a factor of 2 in the exponent.
I think the best way to understand the approach is to look at the code for this. I've put up a snippet below that is hopefully helpful: https://gist.github.com/ahwillia/f65bc70cb30206d4eadec857b98c4065
We will need a few things: \begin{align} \text{ conjugate-rule:}&&& (A\otimes B )^*= A^* \otimes B^* &\text{note they don't change places!} \\ \text{ mixed-product-rule:}&&& (A\otimes B )(C \otimes D) = (AC)\otimes (BD) &\text{if matrices have correct dim} \\ \text{ trace-rule:}&&& {\tt tr}(A \otimes B) = {\tt tr}(A) {\tt tr}(B) \\ \text{ trace-product:}&&& {\tt tr}(A^* B) = {\tt tr}(A B^*) \\ \text{ trace-linearity:}&&& {\tt tr}(\lambda A) = \lambda {\tt tr}(a) &\text{for scalar $\lambda$} \end{align}
Now note that since $\alpha$ and $\beta$ are $1\times 2$, $${\tt (I)}\qquad (\alpha^T\otimes \beta)^* (\alpha^T\otimes \beta) = (\bar\alpha\otimes\beta^*)(\alpha^T\otimes \beta) = (\underbrace{\bar\alpha\alpha^T}_{1\times 1} \otimes \underbrace{\beta^*\beta}_{2\times 2}) = (\bar\alpha\alpha^T)(I_1 \otimes \beta^*\beta) $$
And now we on the one hand: $${\tt (II)}\qquad (\alpha^T\otimes \beta)^* (I_1 \otimes \beta^*\beta) = (\bar\alpha\otimes \beta^*) (\beta^*\beta \otimes I_1) = ((\underbrace{\bar\alpha \beta^*}_{1 \times 1})\beta \otimes \beta^*) = (\bar\alpha \beta^*)(\beta \otimes \beta^*) $$ and on the other $$ {\tt (III)}\qquad (\beta \otimes \beta^*)(\alpha^T\otimes \beta) = (\underbrace{\beta\alpha^T}_{1\times 1}\otimes \beta^*\beta) = (\beta\alpha^T)(I_1 \otimes \beta^*\beta) $$ So Combining (I), (II) and (III) we find that by sort of 'ping-ponging' the middle Kroner-product back and forth that we can extract more and more scalar terms by reducing the powers:
\begin{align} \big((\alpha^T\otimes \beta)^*\big)^k (\alpha^T\otimes \beta)^k &= (\bar\alpha\alpha^T) \big((\alpha^T\otimes \beta)^*\big)^{k-1} (I_1 \otimes \beta^*\beta)(\alpha^T\otimes \beta)^k \\&= (\bar\alpha\alpha^T)(\bar\alpha \beta^*)^{k-1}(\beta\alpha^T)^{k-1}(I_1 \otimes \beta^*\beta) \\&= (\bar\alpha\alpha^T)(\bar\alpha \beta^*)^{k-1}(\beta\alpha^T)^{k-1}(\beta^* \beta) \end{align}
From which your identity can be immediately deduced by the trace linearity and product rule: $$ {\tt tr}(\underbrace{\beta^*\beta}_{2x2}) = {\tt tr}(\underbrace{\beta\beta^*}_{1 \times 1}) = \beta\beta^*$$
Note that since $\beta$ is a row-vector holds: $(I_1 \otimes \beta^*\beta) = (\beta^*\beta \otimes I_1) = \beta^*\beta =(\beta^*\otimes \beta )$. In fact all of the above calculations immediately generalize to the case of $1\times M$ vectors.
If you want to I can also post an answer how to obtain the equality $${\tt tr}((A^*)^k A^k) = 4\left(1+\zeta^v\right)^{2k-1}\left(1+\zeta^{-v}\right)^{2k-1}$$ from your original post without the need of using Kronecker products.
Best Answer
If you "unvectorize" the vectors $x$ and $y$ into square matrices $X,Y$, you could write this as $$ Y = AXA^T. $$ If we want a derivative in some kind of matrix form, we can compute the partial derivative of $Y$ with respect to the $i,j$ entry of A. To that end, for $h \in \Bbb R$, we can write $$ \begin{align} Y(A + h E_{ij}) &= (A + hE_{ij})X(A + hE_{ij})^T \\ & = AXA^T + h(E_{ij} X A^T + AXE_{ij}^T) + o(h) \\ & = Y(A) + h \frac{\partial Y}{\partial a_{ij}} + o(h). \end{align} $$ With that, we have an expression for the desired partial derivative. In terms of the Kronecker delta, the $p,q$ entry of $\frac{\partial Y}{\partial a_{ij}}$ is given by $$ \left[\frac{\partial Y}{\partial a_{ij}}\right]_{p,q} = \delta_{ip} \left(\sum_{k=1}^n x_{jk}a_{qk} \right) + \delta_{iq}\left(\sum_{k=1}^n a_{pk}x_{kj} \right). $$