[Math] Gradient of squared Frobenius norm

derivativesmatricesmatrix-normsnormed-spacesscalar-fields

I would like to find the gradient of $\frac{1}{2} \big \| X A^T \big \|_F^2$ with respect to $X_{ij}$. Going by the chain rule in the Matrix Cookbook (eqn 126), it's something like

$$\frac{\partial}{\partial X_{ij}} \Big[\frac{1}{2} \big\| X A \big\|_F^2 \Big] = \text{Tr} \Big[(XA^T)^T (J^{jk} A^T) \Big]$$

where $J$ has same dimensions as $X$ and has zeros everywhere except for entry $(j,k)$. I m not so sure about the $J^{jk} A^T$ bit (Cookbook eqn 66 applies here?).

Best Answer

Recall that if $A,B \in \mathbb{R}^{m \times n}$ then \begin{equation} \langle A, B \rangle = \text{Tr}(A^T B) \end{equation} and \begin{align*} \|A\|_F^2 &= \langle A,A \rangle \\ &= \text{Tr}(A^T A) \\ &= \text{Tr}(A A^T). \end{align*}

Let $f:\mathbb{R}^{m \times n} \to \mathbb{R}$ such that \begin{align*} f(X) &= \frac12 \| X A^T \|_F^2 \\ &= \frac12 \text{Tr}(X A^T A X^T). \end{align*} Let $J$ be the $m \times n$ matrix whose entries are all $0$ except $J_{ij}$ which is equal to $1$. Let $\Delta X = \epsilon J$, where $\epsilon > 0$ is tiny.

Then

\begin{align*} f(X + \Delta X) &= \frac12 \text{Tr}((X + \Delta X)A^T A (X + \Delta X)^T) \\ &= \frac12 \text{Tr}(X A^T A X^T) + \frac12 \text{Tr}(\Delta X A^T A X^T) + \frac12 \text{Tr}(X A^T A \Delta X^T) \\ & \qquad + \frac12 \text{Tr}(\Delta X A^T A \Delta X^T) \\ &\approx \frac12 \text{Tr}(X A^T A X^T) + \frac12 \text{Tr}(\Delta X A^T A X^T) + \frac12 \text{Tr}(X A^T A \Delta X^T) \\ &= \frac12 \text{Tr}(X A^T A X^T) + \text{Tr}(X A^T A \Delta X^T) \\ &= f(X) + \left\langle X A^T A,\Delta X \right\rangle \\ &= f(X) + \epsilon \left \langle X A^T A,J \right\rangle. \end{align*}

Comparing this result with the equation \begin{equation} f(X + \epsilon J) \approx f(X) + \epsilon \frac{\partial f(X)}{\partial X_{ij}} \end{equation} we see that \begin{equation} \frac{\partial f(X)}{\partial X_{ij}} = \left \langle X A^T A,J \right\rangle. \end{equation}

Related Question