[Math] Gradient of $||Ax – y||^2$ with respect to $A$

matrix-calculusvector analysis

How do I proceed to find $\nabla_A||Ax – y||^2$ where $A \in \mathbb{R}^{n\times n}$ and $x,y \in \mathbb{R}^n$ and the norm is the Euclidean norm.

Attempt so far

$$||Ax – y||^2 = (Ax-y)^T(Ax-y) = x^TA^TAx – 2x^TAy + y^Ty $$

$$ \nabla_A(x^TAy) = xy^T$$

Where I am stuck

I don't know how to tackle the $x^TA^TAx$ term since if I try to apply chain rule, I will have to differentiate a matrix with respect to a matrix.

Best Answer

Before we start deriving the gradient, some facts and notations for brevity:

  • Trace and Frobenius product relation $$\left\langle A, B C\right\rangle={\rm tr}(A^TBC) := A : B C$$
  • Cyclic properties of Trace/Frobenius product \begin{align} A : B C &= BC : A \\ &= A C^T : B \\ &= {\text{etc.}} \cr \end{align}

Let $f := \left\|Ax-y \right\|^2 = Ax-y:Ax-y$.

Now, we can obtain the differential first, and then the gradient. \begin{align} df &= d\left( Ax-y:Ax-y \right) \\ &= \left(dA \ x : Ax-y\right) + \left(Ax-y : dA \ x\right) \\ &= 2 \left(Ax - y\right) : dA \ x \\ &= 2\left( Ax-y\right)x^T : dA\\ \end{align}

Thus, the gradient is \begin{align} \frac{\partial}{\partial A} \left( \left\|Ax-y \right\|^2 \right)= 2\left( Ax-y\right)x^T. \end{align}

Related Question