Derivative of Kronecker product of vector with itself

derivativeskronecker productmatricesmatrix-calculusvectors

I'm struggling with the following problem. Suppose $\pmb{x}$ and $\pmb{y}$ are vectors of the same length and $\pmb{y}$ is not a function of $\pmb{x}$. What is the following derivative?

$$
\frac{\partial}{\partial \pmb{x}} (\pmb{y} – \pmb{x}) \otimes (\pmb{y} – \pmb{x})
$$

My thought was to write use $\pmb{z} = \pmb{y} – \pmb{x}$ and $\pmb{f} = \pmb{z} \otimes \pmb{z}$ and derive first:

\begin{align}
d\pmb{f} &= ((d\pmb{z}) \otimes \pmb{z}) + (\pmb{z} \otimes (d\pmb{z})) \\
&= (\pmb{I} \otimes \pmb{z})d\pmb{z} + (\pmb{z} \otimes \pmb{I})d\pmb{z} \\
&= ((\pmb{I} \otimes \pmb{z}) + (\pmb{z} \otimes \pmb{I}))d\pmb{z} \\
\frac{\partial \pmb{f}}{\partial \pmb{z}} &= (\pmb{I} \otimes \pmb{z}) + (\pmb{z} \otimes \pmb{I})
\end{align}

and then obtain by chain rule:

$$
\frac{\partial}{\partial \pmb{x}} (\pmb{y} – \pmb{x}) \otimes (\pmb{y} – \pmb{x}) = -\left( (\pmb{I} \otimes (\pmb{y} – \pmb{x})) + ((\pmb{y} – \pmb{x}) \otimes \pmb{I}) \right)
$$

Which seems sensble. However, this is part of a Hessian I am deriving, and it's corresponding transpose element I derived to be:

$$
-2\left(\pmb{I} \otimes (\pmb{y} – \pmb{x})\right)
$$

Which is very similar but not the same. Am I missing something obvious?

Best Answer

The gradient that you found is correct. I re-derive it here to make this answer self-contained...

First note that the Kronecker product of two vectors can be expanded in two ways: $$a\otimes b = (I_a\otimes b)\,a = (a\otimes I_b)\,b$$ where $I_a$ is the identity matrix whose dimensions are compatible with the $a$ vector, while $I_b$ is compatible with the $b$ vector.

Define two new vectors $$\eqalign{ z &= x-y \quad\implies dz = dx\cr f &= z\otimes z \cr }$$ Then use the Kronecker expansion to calculate the differential and gradient of $f$. $$\eqalign{ df &= z\otimes dz + dz\otimes z = (z\otimes I + I\otimes z)\,dx \cr G=\frac{\partial f}{\partial x} &= (z\otimes I + I\otimes z) = (x-y)\otimes I + I\otimes(x-y) \cr }$$ Let $e_k$ denote the $k^{th}$ column of the $I$ matrix and $w={\rm vec}(I)$.
Use these to vectorize the $G$ matrix. $$\eqalign{ G &= (z\otimes I + I\otimes z), \quad M = \pmatrix{I\otimes e_1\cr I\otimes e_2\cr\vdots\cr I\otimes e_n} \cr g &= {\rm vec}(G) = \Big(M + w\otimes I\Big)\,z \cr }$$ Now find the differential and gradient of the $g$ vector. $$\eqalign{ dg &= \Big(M + w\otimes I\Big)\,dx \cr H = \frac{\partial g}{\partial x} &= \Big(M + w\otimes I\Big) \cr }$$ So that's the hessian in matrix form. The true Hessian is a 3rd order tensor.