Derivate and chain rule about trace

chain rulederivativesmatrix-calculustrace

suppose $\mathbf{x}\in \mathbb{R}^d $ with size of (1,d), and $\mathbf{T} \in \mathbb{R}^{d \times d}$. Then how to solve the derivate of $\mathbf{x}$:
$$\begin{align}
\textrm{tr}(\ (\mathbf{x}^{\prime}\mathbf{x}-\mathbf{T})^{\prime} (\mathbf{x}^{\prime}\mathbf{x}-\mathbf{T})\ )
\end{align}$$

My steps are from chain rule:

let $L = \textrm{tr}(\ (\mathbf{x}^{\prime}\mathbf{x}-\mathbf{T})^{\prime} (\mathbf{x}^{\prime}\mathbf{x}-\mathbf{T})\ )$.

$$
\begin{aligned}
\frac{\partial{L}}{\partial{\mathbf{x}}}
&=\frac{\partial{L}}{\partial{(\mathbf{x}^{\prime}\mathbf{x}-\mathbf{T})}} \cdot \frac{\partial(\mathbf{x}^{\prime}\mathbf{x}-\mathbf{T})}{\partial \mathbf{x}}\\
&= 2(\mathbf{x}^{\prime}\mathbf{x}-\mathbf{T}) \dot (2\mathbf{x}^{\prime})
\end{aligned}
$$

Is it correct?

Best Answer

If $T$ is symmetric, then your result is ok, but else it is a bit off. I wanted to take the chance to outline a general procedure that works fine for this kinds of problems. Hope you liked.

The derivative

For this kind of tasks you can use the definition of the derivative. If $x$ is a row vector, and $f$ is a differentiable function that eats vectors, then the derivative $Df(x)$ of $f$ at $x$ is the linear part of the change in $f(x)$ when $x$ increases by another vector $h$. That is, $Df(x)$ is the unique vector that satisfies $\def\tr{\mathrm{tr}}$

$$f(x+h) = f(x) + h\,Df(x) + O(h^2). \tag 1$$ where $O(h^2)$ denotes the terms where $h$ appears with order $\geq 2$.

Useful notation

If $A$ and $B$ are matrices, a useful notation is

$$ A : B = \tr(A'B) = \tr (BA') = \tr(B'A) = \tr(BA') = \sum_{ij}A_{ij}B_{ij}.$$

Clearly this product commutes and is unchanged when you transpose both factors. Moreover, if $v$ is a row and $w$ a column, write $$w : v = v : w = vw = w'v'.$$ Why? So that for row vectors $x,h$ you have the nice properties $$x:Ah' = xA:h' = h':xA=h'x:A=A:h'x = Ah':x = \sum_{ij}x_iA_{ij}h_j$$ and still the result is unchanged when both factors are transposed.

Now to you problem

Your $x$ are row vectors. Define $A(x)=(x'x-T)$. Then your function is $$f(x)=A(x):A(x)$$ Note that $$ \begin{align} A(x+h) &= (x'+h')(x+h)-T \\ &= x'x+h'x+x'h-T+O(h^2) \\ &= A(x)+h'x+x'h+O(h^2). \end{align} $$ Hence $$ \begin{align} f(x+h) &= A(x+h):A(x+h) \\ &= [A(x)+h'x+x'h]:[A(x)+h'x+x'h]+O(h^2) \\ &= A(x):A(x) + 2 A(x):[h'x+x'h] + O(h^2) \\ &= f(x) + 2 A(x):[h'x+x'h] + O(h^2) \\ \end{align} $$ Now I will denote $A=A(x)$, since there will be no more $A(x+h)$. $$ \begin{align} f(x+h)-f(x) &= 2[A:h'x + A:x'h] + O(h^2) \\ &= 2[A'x':h + Ax':h] + O(h^2) \\ &= 2h:(A'x' + Ax') + O(h^2) \\ &= 2h:(A' + A)x' + O(h^2) \\ \end{align} $$ Comparing this with $(1)$, you get that $$Df(x) = 2(A' + A)x' = 2(2x'x-T-T')x'$$

Another way

This one is more informal, but can be quite handy. Starting by $$f(x)=A:A$$ an taking differentials: $$ \begin{align} df &= 2A:dA \\ &= 2A:d(x'x) \\ &= 2(A:dx'x + A:x'dx) \\ &= 2(A':x'dx + Ax':dx) \\ &= 2(A'x':dx + Ax':dx) \\ &= 2(A' + A)x':dx \end{align} $$ hence $$\frac{df}{dx} = 2(A'+A)x'$$ as before.