Derivative of a quadratic form with respect to matrix

derivatives

I've got a silly question, but I don't see, where I'm wrong.

Given $x,y \in \mathbb{R}^D$ and $A$ is a symmetric matrix.
$$
f(A) = (x-y)^T A (x-y)\\
\frac{\partial{f}}{\partial{A}} = (x-y)(x-y)^T
$$

But if I expand brackets in $f(A)=x^T A x – 2x^T A y + y^T A y$, because A is symmetric. Then it's derivative
$$
\frac{\partial{f}}{\partial{A}} = xx^T – 2xy^T + yy^T
$$

which is not equal to the derivative in the first case in general. Why is that?

How I've calculated the derivative in the first case.
$$
df = d((x-y)^T A (x-y)) = d(tr((x-y)^T A (x-y))) = tr(d((x-y)^T A (x-y))) =\\
tr((x-y)^T(dA) (x-y)) = tr((x-y)(x-y)^T(dA)) = \langle (x-y)(x-y)^T, dA \rangle
$$

The derivative is $(x-y)(x-y)^T$.

Best Answer

You made a symmetry assumption when you expanded your expression for $f$, but then didn’t account for it in your derivative.

Recall that the derivative of a symmetric matrix should be symmetric (prove this if you don’t see it), and therefore your expression

$$\frac{\partial f}{\partial A} = \frac{1}{2}\left(\frac{\partial f}{\partial A} + \left( \frac{\partial f}{\partial A}\right)^\intercal\right).$$

Applying this to your second expression directly yields the first.

Edit: Another way to see this is that both expressions are in fact your derivative. This is because both are the same linear mapping on symmetric matrices. This makes sense because your mapping is the derivative with respect to a symmetric matrix, which means the only relevant “direction vectors” it can act on are already symmetric.