How do I proceed to find $\nabla_A||Ax – y||^2$ where $A \in \mathbb{R}^{n\times n}$ and $x,y \in \mathbb{R}^n$ and the norm is the Euclidean norm.
Attempt so far
$$||Ax – y||^2 = (Ax-y)^T(Ax-y) = x^TA^TAx – 2x^TAy + y^Ty $$
$$ \nabla_A(x^TAy) = xy^T$$
Where I am stuck
I don't know how to tackle the $x^TA^TAx$ term since if I try to apply chain rule, I will have to differentiate a matrix with respect to a matrix.
Best Answer
Before we start deriving the gradient, some facts and notations for brevity:
Let $f := \left\|Ax-y \right\|^2 = Ax-y:Ax-y$.
Now, we can obtain the differential first, and then the gradient. \begin{align} df &= d\left( Ax-y:Ax-y \right) \\ &= \left(dA \ x : Ax-y\right) + \left(Ax-y : dA \ x\right) \\ &= 2 \left(Ax - y\right) : dA \ x \\ &= 2\left( Ax-y\right)x^T : dA\\ \end{align}
Thus, the gradient is \begin{align} \frac{\partial}{\partial A} \left( \left\|Ax-y \right\|^2 \right)= 2\left( Ax-y\right)x^T. \end{align}